Commit Graph

37 Commits

Author SHA1 Message Date
Ignacio Vera da218c8afd
Add bulk processing capabilities to ES91Int4VectorsScorer (#131202)
It uses the same approach as the one taken in ES91OSQVectorsScorer
2025-07-14 14:58:40 +01:00
Chris Hegarty b486d90ea2
Add low-level optimized Neon, AVX2, and AVX 512 float32 vector operations (#130635)
This commit adds low-level optimized Neon, AVX2, and AVX 512 float32 vector operations; cosine, dot product, and square distance.

The changes in this PR give approximately 2x performance increase for float32 vector operations across Linux/ Mac AArch64 and Linux x64 (both AVX2 and AVX 512).

The performance increase comes mostly from being able to score the vectors off-heap (rather than copying on-heap before scoring). The low-level native scorer implementations show only approx ~3-5% improvement over the existing Panama Vector implementation. However, the native scorers allow to score off-heap. The use of Panama Vector with MemorySegments runs into a performance bug in Hotspot, where the bound is not optimally hoisted out of the hot loop (has been reported and acknowledged by OpenJDK) .

This vector ops will be used by higher-level vector scorers in #130541
2025-07-04 16:49:35 +01:00
Chris Hegarty bd220a5339
Refactor VectorScorerFactoryTests to Int7SQVectorScorerFactoryTests.java (#130620)
This commit refactors VectorScorerFactoryTests to Int7SQVectorScorerFactoryTests, in order to make space for other vector scorer benchmarks, namely float32.
2025-07-04 13:41:13 +01:00
Ignacio Vera f81d35536d
optimize OptimizedScalarQuantizer#scalarQuantize (#129874)
optimize OptimizedScalarQuantizer#scalarQuantize when destination can optimize 
OptimizedScalarQuantizer#scalarQuantize when destination can be an integer array
2025-07-02 14:57:59 +01:00
Chris Hegarty 4d3b699067
JDKVectorLibrary: update low-level bounds checks and add benchmark (#130216)
This commit updates the low-level bounds checks in JDKVectorLibrary and add benchmark, so that we can more easily bench the low-level operations.

Note: I added the mr-jar gradle plugin to the benchmarks so that we can compile with preview features in Java 21, namely MemorySegment.
2025-06-27 19:21:04 +01:00
Ignacio Vera ffea6ca2bf
Introduce an int4 off-heap vector scorer (#129824)
* Introduce an int4 off-heap vector scorer

* iter

* Update server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>

---------

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2025-06-23 18:44:12 +02:00
Ignacio Vera 4ca96c199f
Introduce a vectorize soarDistance function (#129744)
This commit replaces the method #soarResidual with a method call #soarDistance which perfoms better for computing soar distances.
2025-06-20 16:23:50 +02:00
Benjamin Trent 0a9f3a9630
Address when scores can be very large in osq score test (#129592)
Using a static `diff` or epsilon just doesn't work for this test as the
scores can be very large, but relatively close. 

Maybe there is a simpler way, but my mind wasn't wanting to "math" very
much. 

For example, the seed that this previously failed on had scores like
`1.726524E9` and `1.7265239E9`, which, given their size, are really
close together (within 128). But a static epsilon wouldn't capture that.

closes: https://github.com/elastic/elasticsearch/issues/128485
2025-06-19 01:33:29 +10:00
Benjamin Trent 4e926ae41a
Minor ivf cleanups and fixing quantization performance (#129566)
We are accidentally utilizing the non-vectorized quantizer when building
ivf indices. This provides a 3-5x speed improvement on quantizing on my
mac

This fixes that and addresses some minor fixes (removing unused code,
etc.)

Here is a small benchmark result. time spent quantizing goes down
significantly.

<img width="652" alt="image"
src="https://github.com/user-attachments/assets/9f46398c-c587-4e74-bc91-f2e07a63b406"
/>

vs.

<img width="673" alt="image"
src="https://github.com/user-attachments/assets/c4f4679f-d7a7-4486-841f-7dd3e75a11cb"
/>
2025-06-18 05:49:38 +10:00
Simon Cooper 98c1708adb
Add javadocs for BBQ dot product method (#129419) 2025-06-16 10:18:32 +01:00
Benjamin Trent 1324ee0115
Reapply "Adds new unexposed and experimental IVF format (#127528)" (#128005) (#128051)
This reverts commit 8a17a5ed5f.

reapplying ivf format, but with a fix.
2025-05-14 08:47:59 +10:00
John Wagster 8a17a5ed5f
Revert "Adds new unexposed and experimental IVF format (#127528)" (#128005)
Validate Gradle Wrapper / Validation (push) Has been skipped Details
This reverts commit ebe8ea6136.
2025-05-09 17:10:11 -05:00
Benjamin Trent ebe8ea6136
Adds new unexposed and experimental IVF format (#127528) 2025-05-07 14:59:57 -04:00
Benjamin Trent 74faf47121
New bulk scorer for binary quantized vectors via optimized scalar quantization (#127189)
* New bulk scorer for binary quantized vectors via optimized scalar quantization

* fixing headers

* fixing tests
2025-04-29 07:42:08 -04:00
Benjamin Trent 059f91c90c
Panama vector accelerated optimized scalar quantization (#127118)
* Adds accelerates optimized scalar quantization with vectorized functions

* Adding benchmark

* Update docs/changelog/127118.yaml

* adjusting benchmark and delta
2025-04-23 12:51:04 -04:00
Lorenzo Dematté 115062c643
Fix vec_caps to test for OS support too (on x64) (#126911)
On x64, we are testing if we support vector capabilities (1 = "basic" = AVX2, 2 = "advanced" = AVX-512) in order to enable and choose a native implementation for some vector functions, using CPUID.

However, under some circumstances, this is not sufficient: the OS on which we are running also needs to support AVX/AVX2 etc; basically, it needs to acknowledge it knows about the additional register and that it is able to handle them e.g. in context switches. To do that we need to a) test if the CPU has xsave feature and b) use the xgetbv to test if the OS set it (declaring it supports AVX/AVX2/etc).

In most cases this is not needed, as all modern OSes do that, but for some virtualized situations (hypervisors, emulators, etc.) all the component along the chain must support it, and in some cases this is not a given.

This PR introduces a change to the x64 version of vec_caps to check for OS support too, and a warning on the Java side in case the CPU supports vector capabilities but those are not enabled at OS level.

Tested by passing noxsave to my linux box kernel boot options, and ensuring that the avx flags "disappear" from /proc/cpuinfo, and we fall back to the "no native vector" case.

Fixes #126809
2025-04-16 16:06:46 +02:00
Simon Cooper 1f249c74eb
Tweak the delta used for vector scorer tests (#126849)
New panama operations in Lucene 10.2 change the results we get from vector operations slightly
2025-04-15 15:46:23 +01:00
Ignacio Vera ffdfcec334
Upgrade to Lucene 10.2.0 (#126594)
This commit upgrade Elasticsearch to lucene 10.2.0
2025-04-14 13:50:52 +02:00
Simon Cooper 7f1203e472
Add panama implementations of byte-bit and float-bit script operations (#124722) 2025-03-25 13:59:11 +00:00
Simon Cooper da9ed5ae41
Re-enable SIMD operations on JDK 24 (#125484) 2025-03-24 13:13:15 +00:00
Simon Cooper 2ba9e9f8ed
Panama implementation of painless float-byte vector ops (#123270) 2025-03-24 10:30:52 +00:00
Simon Cooper 82668b40f4
Add basic implementations of float-byte script comparisons (#122381)
Add implementations of `cosineSimilarity` and `dotProduct` to query byte vector fields using float vectors
2025-03-03 09:38:37 +00:00
Rene Groeschke ba61f8c7f7
Update Gradle wrapper to 8.12 (#118683)
This updates the gradle wrapper to 8.12

We addressed deprecation warnings due to the update that includes:

- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
2024-12-30 15:34:24 +01:00
Benjamin Trent e10fc3c90d
Speed up bit compared with floats or bytes script operations (#117199)
Instead of doing an "if" statement, which doesn't lend itself to
vectorization, I switched to expand to the bits and multiply the 1s and
0s.

This led to a marginal speed improvement on ARM.

I expect that Panama vector could be used here to be even faster, but I
didn't want to spend anymore time on this for the time being.

```
Benchmark                                              (dims)   Mode  Cnt  Score   Error   Units
IpBitVectorScorerBenchmark.dotProductByteIfStatement      768  thrpt    5  2.952 ± 0.026  ops/us
IpBitVectorScorerBenchmark.dotProductByteUnwrap           768  thrpt    5  4.017 ± 0.068  ops/us
IpBitVectorScorerBenchmark.dotProductFloatIfStatement     768  thrpt    5  2.987 ± 0.124  ops/us
IpBitVectorScorerBenchmark.dotProductFloatUnwrap          768  thrpt    5  4.726 ± 0.136  ops/us
```

Benchmark I used.
https://gist.github.com/benwtrent/b0edb3975d2f03356c1a5ea84c72abc9
2024-12-03 04:19:03 +11:00
Benjamin Trent 374c88a832
Correct bit * byte and bit * float script comparisons (#117404)
I goofed on the bit * byte and bit * float comparisons. Naturally, these
should be bigendian and compare the dimensions with the binary ones
appropriately.

Additionally, I added a test to ensure that this is handled correctly.
2024-11-26 03:38:06 +11:00
Rene Groeschke f6ac6e1c3b
[Build] Remove deprecated BuildParams (#116984) 2024-11-22 16:30:57 +01:00
Rene Groeschke 13c8aaeffa
[Gradle] Remove static use of BuildParams (#115122)
Static fields dont do well in Gradle with configuration cache enabled.

- Use buildParams extension in build scripts
- Keep BuildParams.ci for now for easy serverless migration
-  Tweak testing doc
2024-11-15 17:58:57 +01:00
Benjamin Trent d33a03ce6b
Add support for bitwise inner-product in painless (#116082)
This adds bitwise inner product to painless. 

The idea here is:

 - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&`
 - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask.

This is effectively supporting asynchronous quantization. A prime
example of how this works is:
https://github.com/cohere-ai/BinaryVectorDB

Basically, you do your initial search against the binary space and then
rerank with a differently quantized vector allowing for more information
without additional storage space. 

closes:  https://github.com/elastic/elasticsearch/issues/111232
2024-11-06 09:22:04 +11:00
Ryan Ernst e5d5c17c99
Use directory name as project name for libs (#115720)
The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.

This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.
2024-10-29 13:02:28 -07:00
Luca Cavanna 8efd08b019
Upgrade to Lucene 10 (#114741)
The most relevant ES changes that upgrading to Lucene 10 requires are:

- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area

Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
2024-10-21 13:38:23 +02:00
ChrisHegarty 2a0d5ffc02 Fix simdvec gradle runtime java check 2024-10-08 09:39:21 +01:00
Rene Groeschke d47ca34b16
Fix Gradle configuration in idea for :libs:simdvec (#114251) 2024-10-07 18:51:15 +02:00
Chris Hegarty 7decd52132
Allow incubating Panama Vector in simdvec, and add vectorized ipByteBin (#112933)
Add support for vectorized ipByteBin.

The structure of the implementation and loading framework mirror that of Lucene, but is simplified by avoiding reflective loading since ES has support for a MRJar section for 21.

For now, we just disable warnings-as-errors in this small sourceset, since -Xlint:-incubating is only support since JDK 22. The number of source files is small here. Will investigate how to assert that just the single incubating warning is emitted by javac, at a later point.
2024-10-07 15:08:23 +01:00
Chris Hegarty 32dde26e49
Upgrade to Lucene 9.12.0 (#113333)
This commit upgrades to Lucene 9.12.0.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Chris Hegarty <chegar999@gmail.com>
Co-authored-by: John Wagster <john.wagster@elastic.co>
Co-authored-by: Luca Cavanna <javanna@apache.org>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2024-10-01 08:39:27 +01:00
Mark Vieira a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Lorenzo Dematté 0bc2b19ead
Add AVX-512 optimised vector distance functions for int7 on x64 (#109084)
* Add vec_caps and inner implementation for AVX-512-F (without VNNI)
* select FNNI function name based on vec_caps; native templated implementation for manual unrolling
* Switched compiler to clang for x64, as gcc has a bug
2024-06-28 11:15:35 +02:00
Chris Hegarty fa364bfcaf
Rename the vec module to better reflect that it provides SIMD optimized vector scorers (#109661)
This commit renames the vector module to better reflect its intent - to provide SIMD optimized vector scorer implementations.
2024-06-17 11:10:02 +01:00