Publication Date
12-2023
Advisor(s) - Committee Chair
Yaser Mowafi, Zhonghang Xia, Huanjing Wang
Degree Program
Department of Computer Science
Degree Type
Master of Science
Abstract
Handling nested data collections in large-scale distributed systems poses considerable challenges in query processing, often resulting in substantial costs and error susceptibility. While substantial efforts have been directed toward overcoming computation hurdles in querying vast data collections within relational databases, scant attention has been devoted to the manipulation and flattening procedures necessary for unnesting these data collections. Flattening operations, integral to unnesting, frequently yield copious duplicate data and entail a loss of information, devoid of mechanisms for reconstructing the original structure. These challenges exacerbate in scenarios involving skewed, nested data with irregular inner data collections. Processing such data demands an extravagant number of operations, leading to extensive data duplication and imposing challenges in ensuring balanced distribution across partitions. Consequently, these factors impede performance and scalability. This research introduces a pioneering approach that amalgamates upfront computations with data manipulation techniques, specifically focusing on flattening procedures. This methodology aims to mitigate the adverse implications of data duplication and information loss while effectively addressing both skewed and irregular nesting structures. The efficacy of the proposed approach is assessed through comprehensive evaluations conducted on prominent datasets such as SQuAD, QuAC, and NewsQA, comparing its performance against existing methods like Pandas and recursive, iterative flattening implementations. These evaluations serve as a critical yardstick for gauging the effectiveness and viability of this novel approach in realworld scenarios.
Disciplines
Computer Sciences | Databases and Information Systems | Physical Sciences and Mathematics | Theory and Algorithms
Recommended Citation
Myers, Jeffrey, "Index Bucketing: A Novel Approach to Manipulating Data Structures" (2023). Masters Theses & Specialist Projects. Paper 3695.
https://digitalcommons.wku.edu/theses/3695