Publication Date

Spring 2022

Advisor(s) - Committee Chair

Qui Li (Director), Guangming Xing, Zhonghang Xia

Degree Program

School of Engineering and Applied Sciences

Degree Type

Master of Science

Abstract

Clustering is an important topic in data modeling. K-means Clustering is a well-known partitional clustering algorithm, where a dataset is separated into groups sharing similar properties. Clustering an unbalanced dataset is a challenging problem in data modeling, where some group has a much larger number of data points than others. When a K-means clustering algorithm with Euclidean distance is applied to such data, the algorithm fails to form good clusters. The standard K-means tends to split data into smaller clusters during a clustering process evenly.

We propose a new K-means clustering algorithm to overcome the disadvantage by introducing a different distance metric. The new metric is ignited by the Newton universal law of gravity, where a smaller mass object is moved towards the larger mass object. Experiment results show the effectiveness of the new metric with visual comparison to Euclidean distance. Furthermore, quantitative comparisons using Davies-Bouldin Index also show the superiority of the new metric.

Disciplines

Computer Engineering | Computer Sciences

Share

COinS