Publication Date

12-1-2007

Degree Program

Department of Mathematics and Computer Science

Degree Type

Master of Science in Computer Science

Abstract

The Internet could be considered to be a reservoir of useful information in textual form — product catalogs, airline schedules, stock market quotations, weather forecast etc. There has been much interest in building systems that gather such information on a user's behalf. But because these information resources are formatted differently, mechanically extracting their content is difficult. Systems using such resources typically use hand-coded wrappers, customized procedures for information extraction. Structured data objects are a very important type of information on the Web. Such data objects are often records from underlying databases and displayed in Web pages with some fixed templates. Mining data records in Web pages is useful because they typically present their host pages' essential information, such as lists of products and services. Extracting these structured data objects enables one to integrate data/information from multiple Web pages to provide value-added services, e.g., comparative shopping, meta-querying and search. Web content mining has thus become an area of interest for many researchers because of the phenomenal growth of the Web contents and the economic benefits associated with it. However, due to the heterogeneity of Web pages, automated discovery of targeted information is still posing as a challenging problem.

Disciplines

Computer Sciences

Recommended Citation

Sharma, Dipesh, "Automatically Extract Information from Web Documents" (2007). Masters Theses & Specialist Projects. Paper 376.
https://digitalcommons.wku.edu/theses/376

Download

Included in

Computer Sciences Commons

COinS

TopSCHOLAR®

Masters Theses & Specialist Projects

Automatically Extract Information from Web Documents

Publication Date

Degree Program

Degree Type

Abstract

Disciplines

Recommended Citation

Included in

Browse

Author Corner

Links

TopSCHOLAR®

Masters Theses & Specialist Projects

Automatically Extract Information from Web Documents

Authors

Publication Date

Degree Program

Degree Type

Abstract

Disciplines

Recommended Citation

Included in

Share

Browse

Author Corner

Links