Algorithms

Clustering Algorithms

The k-means algorithm goes as follows:

1) Place k-number of points as the initial group centers in the space represented by the objects being clustered

2) Assign each object to the group that has the closest center

3)When all objects have been assigned, recalculate the positions of the k-number of cluster centers

4)Repeat steps 2 and 3 until the centers no longer move. This procedure produces a separation of the objects into groups from which the metric to be minimized can be calculated

The COP-KMeans Clustering algorithm goes as follows:

Let there be a data set D, a set of must-link constraints Con=D x D, and a set of cannot-link constraints ConD x D.

1) Let C1...Ck be the initial cluster centers.

2) For each point di in D, assign it to the closest cluster Cj such that VIOLATE_CONSTRAINTS(di, Cj, Con&sube, Con&ne) is false. If no such cluster exists, fail (return{}).

3) For each cluster Ci, update its center by averageing all of the points dj that have been assigned to it.

4) Iterate between (2) and (3) until convergence.

5) Return {C1...Ck}.

VIOLATE-CONSTRAINTS(data point d, cluster C, must-link constraints Con=D x D, cannot-link constraints ConD x D)

1) For each (d, d=) ∈ Con= : If d=C, return true.

2) For each (d, d) ∈ Con : If dC, return true.

3) Otherwise, return false.