The Performance of K-Means and K-Modes Clustering to Identify Cluster in Numerical Data

  • Nur Atiqah Hamzah Universiti Tun Hussein Onn Malaysia
  • Sie Long Kek Universiti Tun Hussein Onn Malaysia
  • Sabariah Saharan Universiti Tun Hussein Onn Malaysia
Keywords: Performance, Central tendency, K-means clustering, K-modes clustering, Numerical data

Abstract

Cluster analysis is a formal study of methods and algorithms for natural grouping of objects according to the perceived intrinsic characteristics and the measure similarities in each group of the objects. The pattern of each cluster and the relationship for each cluster are identified, then they are related to the frequency of occurrence in the data set. Meanwhile, the mean and the mode are known as the measures of central tendency in a distribution. In clustering, the mean and the mode are applied as a technique to discover the existing of the cluster in the data set. Therefore, this study aims to compare the performance of K-means and K-modes clustering techniques in finding the group of cluster that exists in the numerical data. The difference between these methods is that the K-modes method is usually applied to categorical data, while K-means method is applied to numerical data. However, both methods would be used to cluster the numerical data in this study. Moreover, performance of these two clustering methods are demonstrated using the output from R software. The results obtained are compared such that the method giving the best output could be determined. In conclusion, the efficiency of the methods is highly presented.
Published
26-12-2017
How to Cite
Hamzah, N. A., Kek, S. L., & Saharan, S. (2017). The Performance of K-Means and K-Modes Clustering to Identify Cluster in Numerical Data. Journal of Science and Technology, 9(3). Retrieved from https://publisher.uthm.edu.my/ojs/index.php/JST/article/view/2038