Publication Date

Spring 2022

Advisor(s) - Committee Chair

Qui Li (Director), Guangming Xing, Zhonghang Xia

Degree Program

School of Engineering and Applied Sciences

Degree Type

Master of Science


Clustering is an important topic in data modeling. K-means Clustering is a well-known partitional clustering algorithm, where a dataset is separated into groups sharing similar properties. Clustering an unbalanced dataset is a challenging problem in data modeling, where some group has a much larger number of data points than others. When a K-means clustering algorithm with Euclidean distance is applied to such data, the algorithm fails to form good clusters. The standard K-means tends to split data into smaller clusters during a clustering process evenly.

We propose a new K-means clustering algorithm to overcome the disadvantage by introducing a different distance metric. The new metric is ignited by the Newton universal law of gravity, where a smaller mass object is moved towards the larger mass object. Experiment results show the effectiveness of the new metric with visual comparison to Euclidean distance. Furthermore, quantitative comparisons using Davies-Bouldin Index also show the superiority of the new metric.


Computer Engineering | Computer Sciences