Publication Date

5-2011

Advisor(s) - Committee Chair

Dr. Guangming Xing (Direcotor), Dr. Qi Li, Dr. Zhonghang Xia

Degree Program

Department of Mathematics and Computer Science

Degree Type

Master of Science

Abstract

The eXtensible Markup Language (XML) has become the standard format for data exchange on the Internet, providing interoperability between different business applications. Such wide use results in large volumes of heterogeneous XML data, i.e., XML documents conforming to different schemas. Although schemas are important in many business applications, they are often missing in XML documents. In this thesis, we present a suite of algorithms that are effective in extracting schema information from a large collection of XML documents. We propose using the cost of NFA simulation to compute the Minimum Length Description to rank the inferred schema. We also studied using frequencies of the sample inputs to improve the precision of the schema extraction. Furthermore, we propose an evaluation framework to quantify the quality of the extracted schema. Experimental studies are conducted on various data sets to demonstrate the efficiency and efficacy of our approach.

Disciplines

Databases and Information Systems | Programming Languages and Compilers

Recommended Citation

Parthepan, Vijayeandra, "Efficient Schema Extraction from a Collection of XML Documents" (2011). Masters Theses & Specialist Projects. Paper 1061.
https://digitalcommons.wku.edu/theses/1061

Download

Included in

Databases and Information Systems Commons, Programming Languages and Compilers Commons

COinS

TopSCHOLAR®

Masters Theses & Specialist Projects

Efficient Schema Extraction from a Collection of XML Documents

Publication Date

Advisor(s) - Committee Chair

Degree Program

Degree Type

Abstract

Disciplines

Recommended Citation

Included in

Browse

Author Corner

Links

TopSCHOLAR®

Masters Theses & Specialist Projects

Efficient Schema Extraction from a Collection of XML Documents

Authors

Publication Date

Advisor(s) - Committee Chair

Degree Program

Degree Type

Abstract

Disciplines

Recommended Citation

Included in

Share

Browse

Author Corner

Links