@Reference(type=Article, author={"Jegou, Herve","Douze, Matthijs","Schmid, Cordelia"}, title="Product Quantization for Nearest Neighbor Search", year="2011", journal="IEEE Trans. Pattern Anal. Mach. Intell.", pages={"117","","128"}, url="http://dx.doi.org/10.1109/TPAMI.2010.57", month="January", number="1", publisher="IEEE Computer Society", volume="33", customData={"issn","0162-8828","numpages","12","doi","10.1109/TPAMI.2010.57","acmid","1916695","address","Washington, DC, USA","keywords","High-dimensional indexing, High-dimensional indexing, image indexing, very large databases, approximate search., approximate search., image indexing, very large databases"}) public class ByteProductQuantiser extends Object
This is achieved by breaking down the input vectors into non-overlapping sub-vectors, and applying quantisation to these sub-vectors individually. The number of bins (cluster centroids) for the sub-vectors is small (up to 256 in this implementation), but when combined over all sub-vectors, the number of bins is much larger as it accounts for all combinations of bins across sub-vectors. As only a small set of centroids needs to be held for the sub-vectors, the memory requirements are quite modest. The output of the quantisation action in this implementation is an array of bytes corresponding to the index of the matching centroid for each sub-vector (index numbers are offset by -128 so that 256 centroids indexes can fit in a single byte). The bit-pattern of this byte array could be interpreted as a numeric value of global cluster index, however in practice this is not useful.
Typically the product quantiser is "trained" so that it adapts to the data
that is is being applied too. The standard approach to this is to use
K-Means, however, this is not required. Insofar as this implementation is
concerned, any set of compatible NearestNeighbours
implementations
can be provided to the constructor. Each of the NearestNeighbours
could even potentially have a different number of dimensions (corresponding
to the sub-vector lengths).
In the standard case, where you just want to use K-Means to train the Product Quantiser, a set of utility methods can be found in the org.openimaj.knn.pq.ByteProductQuantiserUtilities class which can be found in the clustering sub-project (due to the dependence on the K-Means algorithm).
Modifier and Type | Field and Description |
---|---|
protected ByteNearestNeighboursExact[] |
assigners |
protected int |
ndims |
Constructor and Description |
---|
ByteProductQuantiser(ByteNearestNeighboursExact[] assigners)
Construct a
ByteProductQuantiser with the given
nearest-neighbour assigners. |
Modifier and Type | Method and Description |
---|---|
byte[] |
decompress(byte[] qdata)
Decompress the quantised data by replacing each encoded index with the actual centroid subvector.
|
byte[] |
quantise(byte[] data)
Quantise the given data using this Product Quantiser.
|
protected ByteNearestNeighboursExact[] assigners
protected int ndims
public ByteProductQuantiser(ByteNearestNeighboursExact[] assigners)
ByteProductQuantiser
with the given
nearest-neighbour assigners. The number of dimensions of the assigners
determines how long each sub-vector is. There is a one-to-one mapping
between in the order of assigners and sub-vectors.assigners
- the nearest-neighbour assigners.public byte[] quantise(byte[] data)
data
- the data to quantisepublic byte[] decompress(byte[] qdata)
qdata
- the quantised data