Chaitanya Malla

Publication Date


Advisor(s) - Committee Chair

Guangming Xing


Access granted to WKU students, faculty and staff only.

After an extensive unsuccessful search for the author, this thesis is considered an orphan work, which may be protected by copyright. The inclusion of this orphan work on TopScholar does not guarantee that that orphan work may be used for any purpose and any use of the orphan work may subject the user to a claim of copyright infringement. The reproduction of this work is made by WKU without any purpose of direct or indirect commercial advantage and is made for purposes of preservation and research.

See also WKU Archives - Authorization for Use of Thesis, Special Project & Dissertation

Degree Program

Department of Computer Science

Degree Type

Master of Science


Extensible markup and platform independence make XML [5] a befitting document format for a wide range of applications – both online and offline. Computing the edit distances between an XML documents and schemata and the transformation of XML documents to conform to a schema are critical for various document engineering and document mining tasks.

This thesis focuses on the problem of finding the minimum edit distance and an optimum sequence of edit operations to transform an XML document so that it conforms to a schema. Few proposed solutions for this problem [1, 2, 3, 4] have been studied and two of the [1, 2] have ben practically implemented. A schema in DTD is translated to a normalized regular hedge grammar [2] and an XML document is represented as a node labeled ordered tree [1, 2]. A comparative study of the performances of the implemented algorithms [1, 2] has been presented.

Document size becomes a major restriction on the application of such algorithms for practical purposes. A divide and conquer strategy is developed to adopt the second algorithm [2] to process documents larger than what it usually can.


Computer Sciences | Physical Sciences and Mathematics