Advisor(s) - Committee Chair
Qui Li (Director), Guangming Xing, Zhonghang Xia
School of Engineering and Applied Sciences
Master of Science
Clustering is an important topic in data modeling. K-means Clustering is a well-known partitional clustering algorithm, where a dataset is separated into groups sharing similar properties. Clustering an unbalanced dataset is a challenging problem in data modeling, where some group has a much larger number of data points than others. When a K-means clustering algorithm with Euclidean distance is applied to such data, the algorithm fails to form good clusters. The standard K-means tends to split data into smaller clusters during a clustering process evenly.
We propose a new K-means clustering algorithm to overcome the disadvantage by introducing a different distance metric. The new metric is ignited by the Newton universal law of gravity, where a smaller mass object is moved towards the larger mass object. Experiment results show the effectiveness of the new metric with visual comparison to Euclidean distance. Furthermore, quantitative comparisons using Davies-Bouldin Index also show the superiority of the new metric.
Computer Engineering | Computer Sciences
Indulkar, Ajinkya Vishwas, "K-Means Clustering Using Gravity Distance" (2022). Masters Theses & Specialist Projects. Paper 3580.