Publication Date

12-2023

Advisor(s) - Committee Chair

Yaser Mowafi, Zhonghang Xia, Huanjing Wang

Degree Program

Department of Computer Science

Degree Type

Master of Science

Abstract

Handling nested data collections in large-scale distributed systems poses considerable challenges in query processing, often resulting in substantial costs and error susceptibility. While substantial efforts have been directed toward overcoming computation hurdles in querying vast data collections within relational databases, scant attention has been devoted to the manipulation and flattening procedures necessary for unnesting these data collections. Flattening operations, integral to unnesting, frequently yield copious duplicate data and entail a loss of information, devoid of mechanisms for reconstructing the original structure. These challenges exacerbate in scenarios involving skewed, nested data with irregular inner data collections. Processing such data demands an extravagant number of operations, leading to extensive data duplication and imposing challenges in ensuring balanced distribution across partitions. Consequently, these factors impede performance and scalability. This research introduces a pioneering approach that amalgamates upfront computations with data manipulation techniques, specifically focusing on flattening procedures. This methodology aims to mitigate the adverse implications of data duplication and information loss while effectively addressing both skewed and irregular nesting structures. The efficacy of the proposed approach is assessed through comprehensive evaluations conducted on prominent datasets such as SQuAD, QuAC, and NewsQA, comparing its performance against existing methods like Pandas and recursive, iterative flattening implementations. These evaluations serve as a critical yardstick for gauging the effectiveness and viability of this novel approach in realworld scenarios.

Disciplines

Computer Sciences | Databases and Information Systems | Physical Sciences and Mathematics | Theory and Algorithms

Share

COinS