Advisor(s) - Committee Chair
Department of Computer Science
Master of Science
Extensible markup and platform independence make XML  a befitting document format for a wide range of applications – both online and offline. Computing the edit distances between an XML documents and schemata and the transformation of XML documents to conform to a schema are critical for various document engineering and document mining tasks.
This thesis focuses on the problem of finding the minimum edit distance and an optimum sequence of edit operations to transform an XML document so that it conforms to a schema. Few proposed solutions for this problem [1, 2, 3, 4] have been studied and two of the [1, 2] have ben practically implemented. A schema in DTD is translated to a normalized regular hedge grammar  and an XML document is represented as a node labeled ordered tree [1, 2]. A comparative study of the performances of the implemented algorithms [1, 2] has been presented.
Document size becomes a major restriction on the application of such algorithms for practical purposes. A divide and conquer strategy is developed to adopt the second algorithm  to process documents larger than what it usually can.
Computer Sciences | Physical Sciences and Mathematics
Malla, Chaitanya, "Experimental Studies of Edit Distances Between XML Documents & Schemata" (2005). Masters Theses & Specialist Projects. Paper 3407.