kmc - a simple tool for k-means clustering
kmc is an Excel-based tool for cluster analyses. It concentrates on one single clustering method, namely the simple k-means algorithm. Because of its simplicity, kmc is ideal for the non-professional user who does not make cluster analyses within his every-day work. The typical user might be a student who has to carry out a cluster analysis for a seminar paper. She or he has got some experience with Excel and some knowledge of clustering methods, but not on the level of an expert.
kmc is completely free of charge for private use. It has been tested with Excel 2010, but the author does not provide any warranty or service. Commentaries are always welcome.
Apart from the clustering method itself, the current version of kmc offers some additional features which are indispensable within a clustering study:
- The possibility to inspect the data thoroughly by means of important statistics and bivariate scatter plots
- Important data transformations (replace missing values, standardization/normalization)
- When several runs of the clustering algorithm have been carried out, the user can compare their results by means of a synopsis, showing not only the centers of the clusters, but also the values of the Silhouette index. In addition, bivariate scatter plots are available, where the clusters are distinguished by shape and color.
The tool is accompanied by a short manual and published in form of a normal Excel workbook with makros. In the current version, the code is protected by a password and thus not visible. Both the Excel file and the pdf of the manual can be downloaded at the bottom of this page.
Several screen shots are shown below to illustrate how to work with the tool. For more details, see the manual.
The main form contains five sub pages. First you must connect to a population (a data set), which is done in the upper part of the form.
As soon as a connection has been established, essential statistics about the data set are displayed on a worksheet.
You don't have to close the form if you want to study the statistics. There are controls which let you leave through the worksheets and scroll them.
Bivariate scatter plots are available, supporting your search for clusters in the population. To see them, just press button <show plot>
Missing values have to be replaced, and often data have to be standardized as a prerequisite for obtaining a useful solution. As these transformations do not always yield the expected result, you can make a security copy before. To undo an operation, just press button <Restore from copy>
On subpage Cluster you can start the clustering algorithm, after having chosen the number of clusters (segments) and the maximum number of iterations.
Unsuitable solutions may be deleted.
The results of a run of the clustering algorithm are displayed on two worksheets, showing the segments and their members.
Validation of clustering solutions is supported by means of scatter plots, where the clusters are distinguished by color, and by a tabular synopsis listing all runs for the population and indicating their quality by means of the Silhouette index.