K-Means(クラスタイング)
K-means clustering aims to partition n observations into K clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
The result of a cluster analysis shown below as the coloring of the squares into three clusters.
Description
Given a training set of observations:
Where each observation is a d-dimensional real vector, k-means clustering aims to partition the m observations into K (≤ m) clusters:
… so as to minimize the within-cluster sum of squares (i.e. variance).
Below you may find an example of 4 random cluster centroids initialization and further clusters convergence:
Another illustration of k-means convergence:
Cost Function (Distortion)
– index of cluster (1, 2, …, K) to which example x(i) is currently assigned.
– cluster centroid of a cluster to which the example x(i) has been assigned.
For example:
In this case optimization objective will look like the following:
The Algorithm
Randomly initialize K cluster centroids (randomly pick K training examples and set K cluster centroids to that examples).
From: