@Reference(type=Inproceedings, author={"Amirthalingam Ramanan","Mahesan Niranjan"}, title="Resource-Allocating Codebook for Patch-based Face Recognition", year="2009", booktitle="IIS", url="http://eprints.ecs.soton.ac.uk/21401/") public class IntRAC extends Object implements SpatialClusters<int[]>, SpatialClusterer<IntRAC,int[]>, CentroidsProvider<int[]>, HardAssigner<int[],float[],IntFloatPair>
During training, data points are selected at random. The first data point is chosen as a centroid. Every following data point is set as a new centroid if it is outside the threshold of all current centroids. In this way it is difficult to guarantee number of clusters so a minimisation function is provided to allow a close estimate of the required threshold for a given K.
This implementation supports int[] cluster centroids.
In terms of implementation, this class is a both a clusterer, assigner and
the result of the clustering. This is because the RAC algorithm never ends;
that is to say that if a new point is being assigned through the
HardAssigner
interface, and that point is more than the threshold
distance from any other centroid, then a new centroid will be created for the
point. If this behaviour is undesirable, the results of clustering can be
"frozen" by manually constructing an assigner that takes a
CentroidsProvider
(or the centroids provided by calling
getCentroids()
) as an argument.
Modifier and Type | Field and Description |
---|---|
protected ArrayList<int[]> |
codebook |
protected static int[][] |
distances |
protected int |
nDims |
protected double |
threshold |
protected long |
totalSamples |
CLUSTER_HEADER
Constructor and Description |
---|
IntRAC()
Sets the threshold to 128
|
IntRAC(double radiusSquared)
Define the threshold at which point a new cluster will be made.
|
IntRAC(int[][] bKeys,
int subSamples,
int nClusters)
Iteratively select subSamples from bKeys and try to choose a threshold
which results in nClusters.
|
Modifier and Type | Method and Description |
---|---|
String |
asciiHeader()
Header for ascii input.
|
int |
assign(int[] data)
Assign a single point to a cluster.
|
int[] |
assign(int[][] data)
Assign data to a cluster.
|
IntFloatPair |
assignDistance(int[] data)
Assign a single point to a cluster.
|
void |
assignDistance(int[][] data,
int[] indices,
float[] distances)
Assign data to clusters.
|
byte[] |
binaryHeader()
Header for binary input.
|
protected static double |
calculateThreshold(int[][] samples,
int nClusters) |
IntRAC |
cluster(DataSource<int[]> data)
Perform clustering with data from a data source.
|
IntRAC |
cluster(int[][] data)
Perform clustering on the given data.
|
HardAssigner<int[],?,?> |
defaultHardAssigner()
Get the default hard assigner for this clusterer.
|
int[][] |
getCentroids() |
int |
numClusters()
Get the number of clusters.
|
int |
numDimensions()
Get the data dimensionality
|
int[][] |
performClustering(int[][] data) |
void |
readASCII(Scanner in)
Read internal state from in.
|
void |
readBinary(DataInput dis)
Read internal state from in.
|
int |
size()
The number of centroids; this potentially grows as assignments are made.
|
void |
writeASCII(PrintWriter writer)
Write the content of this as ascii to out.
|
void |
writeBinary(DataOutput dos)
Write the content of this as binary to out.
|
protected double threshold
protected int nDims
protected static int[][] distances
protected long totalSamples
public IntRAC()
public IntRAC(double radiusSquared)
radiusSquared
- public IntRAC(int[][] bKeys, int subSamples, int nClusters)
bKeys
- All keys to be trained againstsubSamples
- number of subsamples to select from bKeys each iterationnClusters
- number of clusters to aim forprotected static double calculateThreshold(int[][] samples, int nClusters) throws org.apache.commons.math.MaxIterationsExceededException, org.apache.commons.math.FunctionEvaluationException
org.apache.commons.math.MaxIterationsExceededException
org.apache.commons.math.FunctionEvaluationException
public IntRAC cluster(int[][] data)
SpatialClusterer
cluster
in interface SpatialClusterer<IntRAC,int[]>
data
- the data.public IntRAC cluster(DataSource<int[]> data)
SpatialClusterer
DataSource
could potentially be backed by disk rather in memory.cluster
in interface SpatialClusterer<IntRAC,int[]>
data
- the data.public int numClusters()
SpatialClusters
numClusters
in interface SpatialClusters<int[]>
public int numDimensions()
SpatialClusters
numDimensions
in interface Assigner<int[]>
numDimensions
in interface SpatialClusters<int[]>
public int[] assign(int[][] data)
HardAssigner
assign
in interface HardAssigner<int[],float[],IntFloatPair>
data
- the data.public int assign(int[] data)
HardAssigner
assign
in interface HardAssigner<int[],float[],IntFloatPair>
data
- datum to assign.public String asciiHeader()
ReadableASCII
asciiHeader
in interface ReadableASCII
asciiHeader
in interface WriteableASCII
public byte[] binaryHeader()
ReadableBinary
binaryHeader
in interface ReadableBinary
binaryHeader
in interface WriteableBinary
public void readASCII(Scanner in) throws IOException
ReadableASCII
readASCII
in interface ReadableASCII
in
- source to read from.IOException
- an error reading inputpublic void readBinary(DataInput dis) throws IOException
ReadableBinary
readBinary
in interface ReadableBinary
dis
- source to read from.IOException
- an error reading inputpublic void writeASCII(PrintWriter writer) throws IOException
WriteableASCII
writeASCII
in interface WriteableASCII
writer
- sink to write toIOException
- an error writing to outpublic void writeBinary(DataOutput dos) throws IOException
WriteableBinary
writeBinary
in interface WriteableBinary
dos
- sink to write toIOException
- an error writing to outpublic int[][] getCentroids()
getCentroids
in interface CentroidsProvider<int[]>
public void assignDistance(int[][] data, int[] indices, float[] distances)
HardAssigner
assignDistance
in interface HardAssigner<int[],float[],IntFloatPair>
data
- the data.indices
- the cluster index for each data point.distances
- the distance to the closest cluster for each data point.public IntFloatPair assignDistance(int[] data)
HardAssigner
assignDistance
in interface HardAssigner<int[],float[],IntFloatPair>
data
- point to assign.public HardAssigner<int[],?,?> defaultHardAssigner()
SpatialClusters
defaultHardAssigner
in interface SpatialClusters<int[]>
public int size()
size
in interface HardAssigner<int[],float[],IntFloatPair>
HardAssigner.size()
public int[][] performClustering(int[][] data)
performClustering
in interface Clusterer<int[][]>