In \(k\)-means clustering, the number of desired clusters \(k\) is set in advance and algorithms then try to find \(k\) groups in the data. In the crisp version, each data point is assigned to its nearest cluster centre (hard membership). On the other hand, in fuzzy clustering (the corresponding algorithm is sometimes also called c-means clustering), the memberships are soft. Every data point belongs to some degree to every cluster centre. The membership is usually related to the distance between the data point and the cluster centre. Here, both methods, crisp and fuzzy clustering, are analysed on an artificially generated example data set.

The dataset contains two Gaussian blobs with relatively clear cluster regions in the beginning. The idea is now to add uniformly generated noise in the bottom right rectangle and see how both versions behave. In the beginning, both methods lead to the same result in terms of the point assignments. Note that for fuzzy clustering the points are assigned to the cluster with the highest membership. But with increasing noise, some differences between the two methods become noticeable. This can be seen in the following animation.

Figure 1: Comparison between fuzzy (top) and crisp (middle) \(k\)-means clustering. Additional noise points can be added with the slider. The bottom figure discards the information of the point assignments and focuses only the resulting cluster centres of the two methods. In this way, we can compare the movement of the cluster centres under increasing noise influence.

List of attached files:

← Back to the overview page