Commit Graph

85224 Commits

Author SHA1 Message Date
Benjamin Trent b2c1c4e0f0
New `vector_rescore` parameter as a quantized index type option (#124581)
This adds a new parameter to the quantized index mapping that allows
default oversampling and rescoring to occur. 

This doesn't adjust any of the defaults. It allows it to be configured.
When the user provides `rescore_vector: {oversample: <number>}` in the
query it will overwrite it.

For example, here is how to use it with bbq:

```
PUT rescored_bbq
{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "index_options": {
          "type": "bbq_hnsw",
          "rescore_vector": {"oversample": 3.0}
        }
      }
    }
  }
}
```

Then, when querying, it will auto oversample the `k` by `3x` and rerank
with the raw vectors.

```
POST _search
{
  "knn": {
    "query_vector": [...],
    "field": "vector"
  }
}
```
2025-03-14 00:40:08 +11:00
Yang Wang cd25958700
Refactor Metadata.toXContent by extracting methods (#124689)
The method is long and has two distinct paths for multi-project and
single-project formats. This PR extracts separate method for each of the
code paths for readability.

See also:
https://github.com/elastic/elasticsearch/pull/124613#discussion_r1990981472
2025-03-14 00:31:20 +11:00
elasticsearchmachine 73f0693c2a Mute org.elasticsearch.common.metrics.ExponentiallyWeightedMovingRateTests testEwmr_threadSafe #124692 2025-03-14 00:29:08 +11:00
Craig Taverner d5ddb909a4
ESQL autogenerate docs v3 (#124312)
Building on the work started in https://github.com/elastic/elasticsearch/pull/123904, we now want to auto-generate most of the small subfiles from the ES|QL functions unit tests.

This work also investigates any remaining discrepancies between the original asciidoc version and the new markdown, and tries to minimize differences so the docs do not look too different.

The kibana json and markdown files are moved to a new location, and the operator docs are a little more generated than before (although still largely manual).
2025-03-13 14:16:46 +01:00
Slobodan Adamović cac356ae64
Disable queryable built-in feature in docs YAML tests (#124684)
The .security index is created asynchronously on a cluster startup. This
affects some of the docs YAML tests in a way that they need to account
for the existence of the .security index or wait for the index to be
created and green. This PR disables the feature for docs YAML tests.
Disabling the feature in docs YAML tests will solve the flakiness
without affecting the coverage.

Resolves https://github.com/elastic/elasticsearch/issues/122343 Resolves
https://github.com/elastic/elasticsearch/issues/121748 Resolves
https://github.com/elastic/elasticsearch/issues/121611 Resolves
https://github.com/elastic/elasticsearch/issues/121345 Resolves
https://github.com/elastic/elasticsearch/issues/121338 Resolves
https://github.com/elastic/elasticsearch/issues/121337 Resolves
https://github.com/elastic/elasticsearch/issues/121288 Resolves
https://github.com/elastic/elasticsearch/issues/121287 Resolves
https://github.com/elastic/elasticsearch/issues/121867 Resolves
https://github.com/elastic/elasticsearch/issues/122335 Resolves
https://github.com/elastic/elasticsearch/issues/122681 Resolves
https://github.com/elastic/elasticsearch/issues/121976 Resolves
https://github.com/elastic/elasticsearch/issues/123094 Resolves
https://github.com/elastic/elasticsearch/issues/123192 Resolves
https://github.com/elastic/elasticsearch/issues/122983 Resolves
https://github.com/elastic/elasticsearch/issues/124671 Resolves
https://github.com/elastic/elasticsearch/issues/124103
2025-03-13 23:13:45 +11:00
Martijn van Groningen 81f33e4602
Change downsample's MetricFieldProducers (#124701)
Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.
2025-03-13 11:59:46 +00:00
Mariusz Józala b427a2bf4e
[Tests] Limit IOUtilTests on Windows (#124716)
On Windows read-only directories where files cannot be stored are not
supported. It makes this test irrelevant for this OS.
2025-03-13 21:59:23 +11:00
Jan Kuipers a503497bce
Add max.chunks to EmbeddingRequestChunker to prevent OOM (#123150)
* add max number of chunks

* wire merge function

* implement sparse merge function

* move tests to correct package/file

* float merge function

* bytes merge function

* more accurate byte average

* spotless

* Fix/improve EmbeddingRequestChunkerTests

* Remove TODO

* remove unnecessary field

* remove Chunk generic

* add TODO

* Remove specialized chunks

* add comment

* Update docs/changelog/123150.yaml

* update changelog
2025-03-13 11:38:12 +01:00
David Turner c24f77f547
Fix stack trace in `ActionListener#assertOnce` (#124672)
In #112380 we changed this `assert` to yield a `String` on failure
rather than the original `ElasticsearchException`, which means we don't
see the original completion's stack trace any more. This commit
reinstates the lost stack trace.
2025-03-13 20:43:38 +11:00
Ievgen Degtiarenko 734dd070e7
Query hot indices first (#122928) 2025-03-13 10:28:15 +01:00
Armin Braun 4a951e752c
Make function score query rewrite a little cheaper (#124637)
Just a random thing I noticed, this seemingly overlooked when
porting to the new rewrite API. No need to create a new searcher, we
already have one here.
2025-03-13 10:03:31 +01:00
Nick Tindall 9edeaae6e6
Unmute test that was fixed long ago (#124695)
The test was fixed in in d48cf3f2f0
2025-03-13 18:09:37 +11:00
Martijn van Groningen ce3a778fa1
Improve downsample performance by buffering docids and do bulk processing. (#124477) 2025-03-13 07:46:08 +01:00
Andrei Stefan c48f9a9e1c
ESQL: Change the order of the optimization rules (#124335) 2025-03-13 07:45:37 +02:00
Nick Tindall 74d61a4052
Retry when the server can't be resolved (#123852) 2025-03-13 12:38:04 +11:00
Yang Wang 076f61d40c
Combine cluster and project tasks in xcontent for single project (#124613)
This PR combines both cluster and project tasks under persistent_tasks
for XContent output of Metadata when it contains only a single project,
i.e. there will be no cluster_persistent_tasks in such output.  This is
to maintain the existing output format when the cluster is not
multi-project enabled.

Relates: MP-1945
2025-03-13 10:48:26 +11:00
Nik Everett 7e746ce5c2
ESQL: Fix a test oom (#124685)
The test was generating too much data

Closes #124330
2025-03-13 10:36:32 +11:00
elasticsearchmachine 17044d7f9d Mute org.elasticsearch.multiproject.test.CoreWithMultipleProjectsClientYamlTestSuiteIT test {yaml=search.vectors/41_knn_search_byte_quantized/kNN search plus query} #124687 2025-03-13 10:23:42 +11:00
David Turner bf804fcf34
Add consistency checker for stuck snapshot debugging (#124616)
Checks the local cluster state after marking a shard snapshot as
complete, and logs a message if the completion is not reflected in this
cluster state.
2025-03-13 10:18:04 +11:00
Matt Culbreth b9ec8fd35e
Remove @UpdateForV9 annotations from Security code (#123176) 2025-03-12 18:32:37 -04:00
Mikhail Berezovskiy ecb602de7f
change payload lower bound for resumable upload test (#124674) 2025-03-12 14:10:51 -07:00
Joe Gallo d565304f4b
Fix geoip databases index access after system feature migration (take 3) (#124604) 2025-03-12 14:03:57 -04:00
Tommaso Teofili c971d79a95
Let MLTQuery throw IAE when no analyzer is set (#124662)
* Let MLTQuery throw IAE when no analyzer is set
2025-03-12 18:37:31 +01:00
Mridula 44a3ac444f
Removed logger and also fixed the nitpick comments (#124650) 2025-03-12 18:33:03 +01:00
Pat Whelan 1f6aa1a18d
[ML] Enable DeepSeek Completion (#124665)
Enable streaming Completion as well as Chat Completion
2025-03-12 18:23:05 +01:00
Charlotte Hoblik 9e754ec8f6
[DOCS] Plugin management reference cleanup (#124578)
* add content to plugin management

* add content to Plugin Management

* Update docs/reference/elasticsearch-plugins/plugin-management.md

Co-authored-by: florent-leborgne <florent.leborgne@elastic.co>

* fix applies-to tag

* add ech to docset.yml

---------

Co-authored-by: florent-leborgne <florent.leborgne@elastic.co>
2025-03-12 17:01:10 +01:00
Valeriy Khakhutskyy 44fba7213d
[ML] Provide model_size_stats as soon as an anomaly detection job is opened (#124638)
Fixes #121168
2025-03-12 16:57:58 +01:00
Mariusz Józala 4ff1aade13
[Tests] Fix copying files for test cluster (#124628)
In case when file with `.attach_pid` in name was stored in distribution
and then deleted, the exception could stop copying/linking files
without any sign of issue. The files were then missing in the cluster
used in the test causing them sometimes to fail (depending on which
files haven't been copied).

When using `Files.walk` it is impossible to catch the IOException and
continue walking through files conditionally. It has been replaced with
FileVisitor implementation to be able to continue if the exception is
caused by files left temporarily by JVM but no longer available.
2025-03-12 16:09:55 +01:00
Pat Whelan 9f89a3b318
[ML] Integrate with DeepSeek API (#122218)
Integrating for Chat Completion and Completion task types, both calling
the chat completion API for DeepSeek.
2025-03-12 15:24:39 +01:00
Andrei Dan d553455092
Add metrics around the file extensions we request when populating the cache (#123134)
This adds the file extentions for the blobs we request when populating the
cache.
The possible values for lucene extensions are around 50 and we use a special
"other" category for everything else, as a fallback.
2025-03-12 14:12:56 +00:00
Nik Everett 50aaa1c2a6
ESQL: Pragma to load from stored fields (#122891)
This creates a `pragma` you can use to request that fields load from a
stored field rather than doc values. It implements that pragma for
`keyword` and number fields.

We expect that, for some disk configuration and some number of fields,
that it's faster to load those fields from _source or stored fields than
it is to use doc values. Our default is doc values and on my laptop it's
*always* faster to use doc values. But we don't ship my laptop to every
cluster.

This will let us experiment and debug slow queries by trying to load
fields a different way.

You access this pragma with:
```
curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{
    "query": "FROM foo",
    "pragma": {
        "field_extract_preference": "STORED"
    }
}'
```

On a release build you'll need to add `"accept_pragma_risks": true`.
2025-03-12 09:40:42 -04:00
elasticsearchmachine 690454000c Mute org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainerStatsTests testResumableWrite #124648 2025-03-13 00:39:31 +11:00
Yang Wang 4344a10dcc
AutoCreation waits for active shards in the right project context (#124636)
This PR replaces the default project ID with the actual project while
waiting for active shards after index auto-creation similar to how wait
is done for explicit index creation.
2025-03-13 00:20:28 +11:00
Simon Cooper d7864f4af6
Refactor JMH script vector distance benchmark to add panama benchmarks (#124351)
Add vector benchmarks vs scalar, and automatically pick up new implementations as they get added
2025-03-12 13:15:16 +00:00
Mridula f6538e86e2
Prevent Query Rule Creation with Invalid Numeric Match Criteria (#122823)
* SEARCH-802 - bug fixed - Query rules allows for creation of rules with invalid match criteria

* [CI] Auto commit changes from spotless

* Worked on the comments given in the PR

* [CI] Auto commit changes from spotless

* Fixed Integration tests

* [CI] Auto commit changes from spotless

* Made changes from the PR

* Update docs/changelog/122823.yaml

* [CI] Auto commit changes from spotless

* Fixed the duplicate code issue in queryRuleTests

* Refactored code to clean it up based on PR comments

* [CI] Auto commit changes from spotless

* Logger statements were removed

* Cleaned up the QueryRule tests

* [CI] Auto commit changes from spotless

* Update x-pack/plugin/ent-search/src/test/java/org/elasticsearch/xpack/application/EnterpriseSearchModuleTestUtils.java

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2025-03-12 13:56:13 +01:00
Ievgen Degtiarenko 79e776a488
Add missing capabilities (#124625) 2025-03-12 13:28:18 +01:00
Tim Grein 0b83425d17
[Inference API] Propagate product use case http header to EIS (#124025) 2025-03-12 12:48:24 +01:00
Pete Gillin 43eee87fed
Implement an exponentially weighted moving rate (#124507)
This is intended to be used to efficiently calculate a write load
metric for use by the auto-sharding algorithm which favours more
recent loads.

ES-10037 #comment Core algorithm added in https://github.com/elastic/elasticsearch/pull/124507
2025-03-12 10:09:53 +00:00
Ievgen Degtiarenko 8d11dd2a98
Limit concurrent node requests (#122850) 2025-03-12 10:02:38 +01:00
Moritz Mack c41caeb6cd
Enable FIPS entitlements based on `org.bouncycastle.fips.approved_only`. (#124577)
When enabling FIPS `javax.net.ssl.trustStore` is not necessarily set.
This change adds FIPS entitlements based on
`org.bouncycastle.fips.approved_only=true`, which enforces usage of FIPS
approved functionality only.

Additionally, this PR grants read access to a custom trust store if
provided via `javax.net.ssl.trustStore`, otherwise read access to the
default JDK trust store is granted.

Relates to ES-11025.
2025-03-12 19:54:48 +11:00
Lorenzo Dematté 37a363050e
[Entitlements] Add support for IT tests of always allowed actions (take 2) (#124429)
Writing tests for #123861, turns out that #124195 is not enough.
We really need new IT test cases for "always allowed" actions: in order to be sure they are allowed, we need to setup the plugin with no policy.
This PR adds test cases for that, plus the support for writing test functions that accept one Environment parameter: many test paths we test and allow/deny are relative to paths in Environment, so it's useful to have access to it (see readAccessConfigDirectory as an example)
2025-03-12 09:44:30 +01:00
Lorenzo Dematté d844c6a847
[Entitlements] Exclude `java.desktop` from system modules (#124563)
* exclude java.desktop from system modules

* add IT test
2025-03-12 08:34:52 +01:00
Tim Rühsen c0efcd01c8
[Profiling] Fix NullPointerExceptions by accepting dotted field names (#124506)
* [Profiling] Fix NullPointerExceptions by accepting dotted field names

Profiling uses synthetic source and thus expects nested field names in query responses.
With 8.17+, synthetic source is available only to Enterprise (or higher) subscriptions,
so that smaller subscriptions have dotted field names in query responses.
The profiling plugin relies on nested field names and runs into NullPointerExceptions
if these are not found.

This PR fixes the NullPointerExceptions that could happen with dotted field names.

Signed-off-by: Tim Rühsen <tim.ruhsen@elastic.co>

* Evaluate source only once (cleanup)

---------

Signed-off-by: Tim Rühsen <tim.ruhsen@elastic.co>
2025-03-12 08:34:15 +01:00
elasticsearchmachine 49f7cfb22f Mute org.elasticsearch.lucene.spatial.CartesianCentroidCalculatorTests testAddDifferentDimensionalType #124609 2025-03-12 12:20:01 +11:00
Mikhail Berezovskiy 053b037a9b
GCS blob store: add OperationPurpose/Operation stats counters (#122991) 2025-03-11 17:57:15 -07:00
Lee Hinman dd2a5c691a
Add .monitoring exemption to `DotPrefixValidator` (#124158)
This was something we were planning on removing, but have not yet.

Resolves #124131
2025-03-12 10:06:14 +11:00
elasticsearchmachine ae1d57f941 Mute org.elasticsearch.xpack.restart.MLModelDeploymentFullClusterRestartIT testDeploymentSurvivesRestart {cluster=OLD} #124160 2025-03-12 08:33:19 +11:00
kanoshiou deff3df9f0
ES|QL: Support `::date` in inline cast (#123460)
* Inline cast to date

* Update docs/changelog/123460.yaml

* New capability for `::date` casting

* More tests

* Update tests

---------

Co-authored-by: Fang Xing <155562079+fang-xing-esql@users.noreply.github.com>
2025-03-11 17:08:10 -04:00
Mark Vieira 74ccd4dba2
Filter module-info.class from entitlements-bridge jar in distribution (#124580) 2025-03-11 13:37:03 -07:00
Patrick Doyle 5112dbbb3b
Reduce noise from NotEntitledException logging (#124511)
* Refactor: findRequestingFrame

* INFO instead of WARN for NotEntitledException.

Some of these are expected, so an INFO seems more appropriate.

The stack trace tends to attract attention even when entitlements are not the
cause of a problem, so let's avoid the stack trace, but still include stack
frame info from the frame of interest.

* Use child loggers for Not Entitled logs

* Use warn, and include compoenent name

* Fix ALL_UNNAMED

* Mute entitlement warnings from repositories

* PR feedback

* Common out the Not Entitled prefix.

We're alerting on this, so let's not rely on every caller of notEntitled to remember it.
2025-03-11 15:50:31 -04:00