public class SphericalKMeans extends Object implements SpatialClusterer<SphericalKMeansResult,double[]>
Spherical K-Means uses the inner product as the similarity metric, and is constrained to finding centroids that lie on the surface of the unit hypersphere (i.e. their length is 1). More formally, it solves:
min_{D,s}(sum_i(||Ds^(i) - x^(i)||_2^2))
s.t. ||s^(i)||_0 <= 1, for all i and ||D^(j)||_2 = 1, for all i
where D is a dictionary of centroids (with unit length) and s is an indicator vector that is all zeros, except for a non-zero value in the position corresponding top the assigned centroid.
The optional damping operation includes the previous centroid position in the update computation, ensuring smoother convergence.
This implementation performs initialisation by randomly sampling centroids from a Gaussian distribution, and then normalising to unit length. Any centroids that become empty during the iterations are replaced by a new random centroid generated in the same manner.
This implementation is able to deal with larger-than-memory datasets by
streaming the samples from disk using an appropriate DataSource
. The
only requirement is that there is enough memory to hold all the centroids
plus working memory for the batches of samples being assigned.
Modifier and Type | Class and Description |
---|---|
static class |
SphericalKMeans.IterationResult
Object storing the result of the previous iteration of spherical kmeans.
|
Modifier and Type | Field and Description |
---|---|
protected boolean |
damped |
protected List<Operation<SphericalKMeans.IterationResult>> |
iterationListeners |
protected int |
k |
protected int |
maxIters |
protected Random |
rng |
protected double |
terminationEps |
Constructor and Description |
---|
SphericalKMeans(int k)
Construct with the given parameters.
|
SphericalKMeans(int k,
int maxIters)
Construct with the given parameters.
|
SphericalKMeans(int k,
int maxIters,
boolean damped)
Construct with the given parameters.
|
Modifier and Type | Method and Description |
---|---|
void |
addIterationListener(Operation<SphericalKMeans.IterationResult> op)
Add a listener that will be called before every iteration.
|
SphericalKMeansResult |
cluster(DataSource<double[]> data)
Perform clustering with data from a data source.
|
SphericalKMeansResult |
cluster(double[][] data)
Perform clustering on the given data.
|
int[][] |
performClustering(double[][] data) |
protected final boolean damped
protected final int maxIters
protected final int k
protected final double terminationEps
protected List<Operation<SphericalKMeans.IterationResult>> iterationListeners
public SphericalKMeans(int k, int maxIters)
k
- number of clustersmaxIters
- maximum number of iterationspublic SphericalKMeans(int k, int maxIters, boolean damped)
k
- number of clustersmaxIters
- maximum number of iterationsdamped
- use damped updatespublic SphericalKMeans(int k)
k
- number of clusterspublic int[][] performClustering(double[][] data)
performClustering
in interface Clusterer<double[][]>
public SphericalKMeansResult cluster(double[][] data)
SpatialClusterer
cluster
in interface SpatialClusterer<SphericalKMeansResult,double[]>
data
- the data.public SphericalKMeansResult cluster(DataSource<double[]> data)
SpatialClusterer
DataSource
could potentially be backed by disk rather in memory.cluster
in interface SpatialClusterer<SphericalKMeansResult,double[]>
data
- the data.public void addIterationListener(Operation<SphericalKMeans.IterationResult> op)
op
- the listener