java.lang.Object
org.apache.lucene.util.hnsw.HnswGraphBuilder<T>
- Type Parameters:
T- the type of vector
Builder for HNSW graph. See
HnswGraph for a gloss on the algorithm and the meaning of the
hyperparameters.-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final intprivate static final longDefault random seed for level generation *private final HnswGraphSearcher<T>(package private) final OnHeapHnswGraphstatic final StringA name for the HNSW component for the info-stream *private InfoStreamprivate final intprivate final doubleprivate final SplittableRandomstatic longRandom seed for level generation; public to expose for testing *private final NeighborArrayprivate final VectorSimilarityFunctionprivate final VectorEncodingprivate final RandomAccessVectorValues<T>private final RandomAccessVectorValues<T> -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivateHnswGraphBuilder(RandomAccessVectorValues<T> vectors, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph. -
Method Summary
Modifier and TypeMethodDescriptionprivate voidaddDiverseNeighbors(int level, int node, NeighborQueue candidates) voidaddGraphNode(int node, RandomAccessVectorValues<T> values) voidaddGraphNode(int node, T value) Inserts a doc with vector value to the graphprivate voidaddVectors(RandomAccessVectorValues<T> vectorsToAdd) build(RandomAccessVectorValues<T> vectorsToAdd) Reads all the vectors from two copies of aRandomAccessVectorValues.static <T> HnswGraphBuilder<T>create(RandomAccessVectorValues<T> vectors, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) private booleandiversityCheck(int candidate, float score, NeighborArray neighbors) private intfindWorstNonDiverse(NeighborArray neighbors) Find first non-diverse neighbour among the list of neighbors starting from the most distant neighboursgetGraph()private static intgetRandomGraphLevel(double ml, SplittableRandom random) private booleanisDiverse(byte[] candidate, NeighborArray neighbors, float score) private booleanisDiverse(float[] candidate, NeighborArray neighbors, float score) private booleanisDiverse(int candidate, NeighborArray neighbors, float score) private booleanisWorstNonDiverse(int candidateIndex, byte[] candidateVector, NeighborArray neighbors) private booleanisWorstNonDiverse(int candidateIndex, float[] candidateVector, NeighborArray neighbors) private booleanisWorstNonDiverse(int candidateIndex, NeighborArray neighbors) private voidpopToScratch(NeighborQueue candidates) private longprintGraphBuildStatus(int node, long start, long t) private voidselectAndLinkDiverse(NeighborArray neighbors, NeighborArray candidates, int maxConnOnLevel) voidsetInfoStream(InfoStream infoStream) Set info-stream to output debugging information *
-
Field Details
-
DEFAULT_RAND_SEED
private static final long DEFAULT_RAND_SEEDDefault random seed for level generation *- See Also:
-
HNSW_COMPONENT
A name for the HNSW component for the info-stream *- See Also:
-
randSeed
public static long randSeedRandom seed for level generation; public to expose for testing * -
M
private final int M -
beamWidth
private final int beamWidth -
ml
private final double ml -
scratch
-
similarityFunction
-
vectorEncoding
-
vectors
-
random
-
graphSearcher
-
hnsw
-
infoStream
-
vectorsCopy
-
-
Constructor Details
-
HnswGraphBuilder
private HnswGraphBuilder(RandomAccessVectorValues<T> vectors, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) throws IOException Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
vectors- the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.M- – graph fanout parameter used to calculate the maximum number of connections a node can have – M on upper layers, and M * 2 on the lowest level.beamWidth- the size of the beam search to use when finding nearest neighbors.seed- the seed for a random number generator used during graph construction. Provide this to ensure repeatable construction.- Throws:
IOException
-
-
Method Details
-
create
public static <T> HnswGraphBuilder<T> create(RandomAccessVectorValues<T> vectors, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) throws IOException - Throws:
IOException
-
build
Reads all the vectors from two copies of aRandomAccessVectorValues. Providing two copies enables efficient retrieval without extra data copying, while avoiding collision of the returned values.- Parameters:
vectorsToAdd- the vectors for which to build a nearest neighbors graph. Must be an independent accessor for the vectors- Throws:
IOException
-
addVectors
- Throws:
IOException
-
setInfoStream
Set info-stream to output debugging information * -
getGraph
-
addGraphNode
Inserts a doc with vector value to the graph- Throws:
IOException
-
addGraphNode
- Throws:
IOException
-
printGraphBuildStatus
private long printGraphBuildStatus(int node, long start, long t) -
addDiverseNeighbors
- Throws:
IOException
-
selectAndLinkDiverse
private void selectAndLinkDiverse(NeighborArray neighbors, NeighborArray candidates, int maxConnOnLevel) throws IOException - Throws:
IOException
-
popToScratch
-
diversityCheck
private boolean diversityCheck(int candidate, float score, NeighborArray neighbors) throws IOException - Parameters:
candidate- the vector of a new candidate neighbor of a node nscore- the score of the new candidate and node n, to be compared with scores of the candidate and n's neighborsneighbors- the neighbors selected so far- Returns:
- whether the candidate is diverse given the existing neighbors
- Throws:
IOException
-
isDiverse
- Throws:
IOException
-
isDiverse
private boolean isDiverse(float[] candidate, NeighborArray neighbors, float score) throws IOException - Throws:
IOException
-
isDiverse
private boolean isDiverse(byte[] candidate, NeighborArray neighbors, float score) throws IOException - Throws:
IOException
-
findWorstNonDiverse
Find first non-diverse neighbour among the list of neighbors starting from the most distant neighbours- Throws:
IOException
-
isWorstNonDiverse
- Throws:
IOException
-
isWorstNonDiverse
private boolean isWorstNonDiverse(int candidateIndex, float[] candidateVector, NeighborArray neighbors) throws IOException - Throws:
IOException
-
isWorstNonDiverse
private boolean isWorstNonDiverse(int candidateIndex, byte[] candidateVector, NeighborArray neighbors) throws IOException - Throws:
IOException
-
getRandomGraphLevel
-