Commit Graph

18535 Commits

Author SHA1 Message Date
Nhat Nguyen 256437902b
Add optimized path for intermediate values aggregator (#131390)
Similar to #127849, this change adds an optimized path for leveraging 
ordinal blocks of intermediate input pages in the Values aggregator.
Below are the micro-benchmark results.

Before:
```
// 1 raw input page + 1000 intermediate input pages
Benchmark                      (dataType)  (groups)  Mode  Cnt       Score   Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    2       0.382          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    2     112.293          ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    2  113182.908          ms/op
```

```
After:
// 1 raw input page + 1000 intermediate input pages
Benchmark                      (dataType)  (groups)  Mode  Cnt      Score   Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    2      0.378          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    2     34.410          ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    2  64654.830          ms/op
```
1K groups:  112 ms -> 34.4ms
1M groups:     113s -> 64s

More to come with #130510

Relates #127849
2025-07-21 12:13:57 -07:00
Keith Massey 2381e5dea9
Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (#131236) 2025-07-21 13:26:54 -04:00
Pawan Kartik c1c72186c1
Refresh potential lost connections at query start for `_search` (#130463)
CPS S2D9: Explicitly refresh connection(s) to remote(s) before executing query.

Previously, we'd refresh connection(s) to remote only when skip_unavailable=false.
We now do it when operating under CPS context too. However, to prevent listening for
too long, we now listen for a short time -- the duration to wait is controlled by the
setting search.ccs.force_connect_timeout that we'd eventually inject for CPS env.
2025-07-21 18:17:53 +01:00
Julian Kiryakov 631cbac07a
Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (#131531)
Allow LIKE LIST and RLIKE LIST to take advantage of ReplaceStringCasingWithInsensitiveRegexMatch for optimizations.
Add unit tests to confirm the change works and the results are correct.

The following is pushed down as just first_name RLIKE ("G.*")

FROM employees 
| WHERE TO_UPPER(first_name) RLIKE ("G.*", "a.*", "bE.*") 
The following is pushed down as just first_name LIKE ("G.*")

FROM employees 
| WHERE TO_UPPER(TO_LOWER(TO_UPPER(first_name))) LIKE ("G.*", "a.*", "bE.*")
2025-07-21 12:42:58 -04:00
Aurélien FOUCRET 81606a353f
[ES|QL] Add doc for the COMPLETION command (#131010) 2025-07-21 18:05:54 +02:00
Charlotte Hoblik 0eca7032e5
Update index mapping update privileges (#130894) 2025-07-21 14:38:15 +02:00
Iván Cea Fontenla aa77c4a734
ESQL: Added Sample operator NamedWritable to plugin (#131541)
`SampleOperator.Status` wasn't declared as a NamedWritable by the plugin, leading to serialization errors when `SAMPLE` is used with `profile: true`.

It leads to an `IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.compute.operator.Operator$Status][sample]`

Profiles will be tested in this PR: https://github.com/elastic/elasticsearch/pull/131474, that's currently failing because of this bug
2025-07-21 14:27:47 +02:00
David Turner b4a455d1b6
Clarify heap size configuration (#131607)
Spell out that total memory needs to account for multiple nodes, and
other processes, and that the OOM killer might react if you ignore this
guidance.
2025-07-21 12:30:35 +01:00
Stef Nestor ac398e3b61
New Slow log troubleshooting video (#131557) 2025-07-18 15:30:41 -06:00
Ruben van Staden dc96c362f5
otel-data: enable failure store for OTEL datastreams (#131395) 2025-07-18 17:16:58 -04:00
Jan-Kazlouski-elastic beb18a87c3
Add Llama support to Inference Plugin (#130092)
* Refactor Hugging Face service settings and completion request methods for consistency

* Add Llama model support for embeddings and chat completions

* Refactor Llama request classes to improve secret settings handling

* Refactor DeltaParser in LlamaStreamingProcessor to improve argument handling

* Enhance Llama streaming processing by adding support for nullable object arrays

* [CI] Auto commit changes from spotless

* Fix error messages in LlamaActionCreator

* [CI] Auto commit changes from spotless

* Add detailed Javadoc comments to Llama classes for improved documentation

* Enhance LlamaChatCompletionResponseHandler to support mid-stream error handling and improve error response parsing

* Add Javadoc comments to Llama classes for improved documentation and clarity

* Fix checkstyle

* Update LlamaEmbeddingsRequest to use mediaTypeWithoutParameters for content type header

* Add unit tests for LlamaActionCreator and related models

* Add unit tests for LlamaChatCompletionServiceSettings to validate configuration parsing and serialization

* Add unit tests for LlamaEmbeddingsServiceSettings to validate configuration parsing and serialization

* Add unit tests for LlamaEmbeddingsServiceSettings to validate various configuration scenarios

* Add unit tests for LlamaChatCompletionResponseHandler to validate error response handling

* Refactor Llama embedding and chat completion tests for consistency and clarity

* Add unit tests for LlamaChatCompletionRequestEntity to validate message serialization

* Add unit tests for LlamaEmbeddingsRequest to validate request creation and truncation behavior

* Add unit tests for LlamaEmbeddingsRequestEntity to validate XContent serialization

* Add unit tests for LlamaErrorResponse to validate error handling from HTTP responses

* Add unit tests for LlamaChatCompletionServiceSettings to validate configuration parsing and serialization

* Add tests for LlamaService request configuration validation and error handling

* Fix error message formatting in LlamaServiceTests for better localization support

* Refactor Llama model classes to implement accept method for action visitors

* Hide Llama service from configuration API to enhance security and reduce exposure

* Refactor Llama model classes to remove modelId and update embedding request handling

* Refactor Llama request classes to use pattern matching for secret settings

* Update embeddings handler to use HuggingFace response entity

* Refactor Mistral model classes to remove modelId and update rate limit hashing

* Refactor Mistral action classes to remove taskSettings parameter and streamline action creation

* Refactor Llama and Mistral models to remove taskSettings parameter and simplify model instantiation

* Refactor Llama service tests to use Model instead of CustomModel and update similarity measure to DOT_PRODUCT

* Remove unused tests and imports from LlamaServiceTests

* Add chunking settings support to Llama embeddings model tests

* Add changelog

* Add support for version checks in Llama settings and define new transport version

* Refactor Llama model assertions and remove unused version support methods

* Refactor Llama service constructors to include ClusterService and improve error message handling

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-07-18 16:26:20 -04:00
Pat Whelan feafb3a2ef
[ML] Track inference deployments (#131442)
Record duration and errors when Inference Endpoints deploy Trained
Models. The new metric is `es.inference.trained_model.deployment.time`.

Refactored `InferenceStats` into server so it can be used in
`InferenceServiceExtension` and passed to InferenceServices rather than
remain at the Transport layer.
2025-07-18 18:35:35 +02:00
Kathleen DeRusso 90699d3cc3
Fix semantic highlighting bug on flat quantized fields (#131525)
* Fix semantic highlighting bug on flat quantized fields

* Update docs/changelog/131525.yaml
2025-07-18 17:27:34 +02:00
Julian Kiryakov e67e50b3f1
[DOCS][ESQL] Fix release version in Docs for RLIKE LIST (#131465)
RLIKE LIST did not manage to make it into 9.1.
In this PR, we modify the documentation to make it clear that it will be available in 9.2, not 9.1
2025-07-18 10:20:07 -04:00
Alexander Spies 06e39c0377
ESQL: Disallow remote enrich after lu join (#131426)
Fix https://github.com/elastic/elasticsearch/issues/129372

Due to how remote ENRICH is
[planned](32e50d0d94/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java (L93)),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](https://github.com/elastic/elasticsearch/issues/129372#issuecomment-3083024230)
and my draft https://github.com/elastic/elasticsearch/pull/131286 for
enabling this.)
2025-07-18 15:30:57 +02:00
Lorenzo Dematté 27a09d8529
Upgrade apm-agent to 1.55.0 (#131510) 2025-07-18 15:04:07 +02:00
Tommaso Teofili d70093b3ad
ScoreTests capability check (#131516) 2025-07-18 14:41:37 +02:00
Nik Everett 439b8e68bb
ESQL: Split large pages on load sometimes (#131053)
This adds support for splitting `Page`s of large values when loading
from single segment, non-descending hits. This is hottest code path as
it's how we load data for aggregation. So! We had to make very very very
sure this doesn't slow down the fast path of loading doc values.

Caveat - this only defends against loading large values via the
row-by-row load mechanism that we use for stored fields and _source.
That covers the most common kinds of large values - mostly `text` and
geo fields. If we need to split further on docs values, we'll have to
invent something for them specifically. For now, just row-by-row.

This works by flipping the order in which we load row-by-row and
column-at-a-time values. Previously we loaded all column-at-a-time
values first because that was simpler. Then we loaded all of the
row-by-row values. Now we save the column-at-a-time values and instead
load row-by-row until the `Page`'s estimated size is larger than a "jumbo"
size which defaults to a megabyte.

Once we load enough rows that we estimate the page is "jumbo", we then
stop loading rows. The Page will look like this:

```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        | <-- after loading this row
|      |     |      |      |        |     we crossed to "jumbo" size
|      |     |      |      |        |
|      |     |      |      |        |
|      |     |      |      |        | <-- these rows are entirely empty
|      |     |      |      |        |
|      |     |      |      |        |
```

Then we chop the page to the last row:
```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
```

Then fill in the column-at-a-time columns:
```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX |   1 | XXXX |   11 |    1.0 |
| XXXX |   2 | XXXX |   22 |   -2.0 |
| XXXX |   3 | XXXX |   33 |    1e9 |
| XXXX |   4 | XXXX |   44 |    913 |
| XXXX |   5 | XXXX |   55 | 0.1234 |
| XXXX |   6 | XXXX |   66 | 3.1415 |
```

And then we return *that* `Page`. On the next `Driver` iteration we
start from where we left off.
2025-07-18 08:37:14 -04:00
Julian Kiryakov 40585a2199
Add checks that optimizers do not modify the layout (#130855)
Add verification that the optimizers do not modify the number of attributes and the attribute datatype.
We add special handling for Lookup Join, by checking EsQueryExec esQueryExec && esQueryExec.indexMode() == LOOKUP and another special handling for ProjectAwayColumns.ALL_FIELDS_PROJECTED

Closes #125576
2025-07-18 08:33:58 -04:00
Nik Everett 6ed50e1bae
Explain `ignore_above` better (#129284)
This concept is complicated.

Closes #128991

Co-authored-by: Larisa Motova <larisa@motovs.org>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-07-17 15:40:17 -04:00
Liam Thompson 56477d81a9
Split retrievers docs and redirect anchors (#131385) 2025-07-17 21:21:53 +02:00
eyalkoren 221998d9fb
Revert "Support Fields API in conditional ingest processors (#121914)" (#131452)
This reverts commit a6f0f6fb4d.
2025-07-17 11:23:27 -07:00
Ruben van Staden 280793df09
apm-data: enable failure store for newly created APM datastreams (#131296) 2025-07-17 13:35:28 -04:00
Pat Whelan 6da8f92392
[ML] Block trained model updates from inference (#130940)
When the Trained Model has been deployed through the Inference Endpoint
API, it can only be updated using the Inference Endpoint API.

When the Trained Model has been deployed and then attached to an
Inference Endpoint, it can only be updated using the Trained Model API.

Fix #129999

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>
2025-07-17 16:57:41 +02:00
Samiul Monir e2bb47c3bb
Fix Semantic Query Rewrite Interception Drops Boosts (#129282)
* fix boosting for knn

* Fixing for match query

* fixing for match subquery

* fix for sparse vector query boost

* fix linting issues

* Update docs/changelog/129282.yaml

* update changelog

* Copy constructor with match query

* util function to create sparseVectorBuilder for sparse query

* util function for knn query to support boost

* adding unit tests for all intercepted query terms

* Adding yaml test for match,sparse, and knn

* Adding queryname support for nested query

* fix code styles

* Fix failed yaml tests

* Update docs/changelog/129282.yaml

* update yaml tests to expand test scenarios

* Updating knn to copy constructor

* adding yaml tests for multiple indices

* refactoring match query to adjust boost and queryname and move to copy constructor

* refactoring sparse query to adjust boost and queryname and move to copy constructor

* [CI] Auto commit changes from spotless

* Refactor sparse vector to adjust boost and queryname in the top level

* Refactor knn vector to adjust boost and queryname in the top level

* fix knn combined query

* fix unit tests

* fix lint issues

* remove unused code

* Update inference feature name

* Remove double boosting issue from match

* Fix double boosting in match test yaml file

* move to bool level for match semantic boost

* fix double boosting for sparse vector

* fix double boosting for sparse vector in yaml test

* fix knn combined query

* fix knn combined query

* fix sparse combined query

* fix knn yaml test for combined query

* refactoring unit tests

* linting

* fix match query unit test

* adding copy constructor for match query

* refactor copy match builder to intercepter

* [CI] Auto commit changes from spotless

* fix unit tests

* update yaml tests

* fix match yaml test

* fix yaml tests with 4 digits error margin

* unit tests are now more randomized

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-07-17 10:51:29 -04:00
Ben Chaplin f739673b48
Fix bug in point in time response (#131391)
Correct response which had swapped "skipped" and "failed" shard counts.
2025-07-17 10:48:19 -04:00
Evgenii-Kazannik d06b0c8c17
Add Azure AI Rerank support (#129848)
* Add Azure AI Rerank support

* address comments

* address comments

* refactor azure ai studio service

* update rerank task settings test

* add provider for rerank
2025-07-17 09:24:02 -04:00
Jan Kuipers ec7f77becb
ES|QL categorize options (#131104)
* ES|QL categorize options

* refactor options

* fix serialization

* polish

* add verfications

* better test coverage + polish code

* better test coverage + polish code
2025-07-17 10:24:30 +02:00
kanoshiou ac0c50820a
ESQL: Fix inconsistent column order in MV_EXPAND (#129745)
The new attribute generated by MV_EXPAND should remain in the original position. The projection added by ProjectAwayColumns does not respect the original order of attributes.

Make ProjectAwayColumns respect the order of attributes to fix this.
2025-07-17 10:23:04 +02:00
Lisa Cawley f135998b11
[DOCS] Augment self-managed connector tutorials (#131127) 2025-07-16 17:00:21 -07:00
O.K. 9226a656d5
[DOCS]: fix decimal digit filter reference (#129695) 2025-07-16 14:48:08 -04:00
Dan Rubinstein 9c6cf90456
Enable force inference endpoint deleting for invalid models and after stopping model deployment fails (#129090)
* Enable force inference endpoint deleting for invalid models and after stopping model deployment fails

* Update docs/changelog/129090.yaml

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-07-16 12:44:48 -04:00
Liam Thompson 27bba64b84
[DOCS] Remove misused applies_to tag (#131349)
* [DOCS] Remove misused applies_to tag
2025-07-16 09:37:59 -04:00
David Turner 2af0e9f5e1
Add note on o11y to architecture guide (#131291)
Adds brief overview that mostly just serves as a location for links to
other more comprehensive design documentation elsewhere in the source
tree.

Closes ES-9874
2025-07-16 11:36:27 +01:00
David Turner ebdfa9c0f5
Upgrade AWS Java SDK to 2.31.78 (#131050)
This picks up the fix for the locale bug reported at
https://github.com/aws/aws-sdk-java-v2/issues/5968.

This reverts the patching of the library added in commit
2697a3a872, except for the test
enhancement.

Fix confirmed in e.g. locale `ar-ER` with

    ./gradlew ":plugins:discovery-ec2:test" \
        --tests "org.elasticsearch.discovery.ec2.Ec2DiscoveryTests.testFilterByTags" \
        -Dtests.seed=596874EED28A2B92 -Dtests.locale=ar-ER -Dtests.timezone=NET \
        -Druntime.java=24
2025-07-16 11:48:02 +02:00
eyalkoren a6f0f6fb4d
Support Fields API in conditional ingest processors (#121914) 2025-07-16 11:46:17 +03:00
Carlos Delgado 6ffe27d030
ESQL - KNN function uses prefilters when pushed down to Lucene (#131004) 2025-07-16 10:19:18 +02:00
Luigi Dell'Aquila 7146681322
Add docs for ES|QL query logs (#131287) 2025-07-16 10:15:08 +02:00
Tim Vernum dc48b4b28b
Add attribute count to SamlAttribute toString (#131173)
Sometimes SAML IdPs send what _should_ be a list of values as a single
comma-separated string.

That is, we expect something using SAML's multi-valued attribute
feature:

    <saml:Attribute NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"
       Name="http://idp.example.org/attributes/groups" FriendlyName="groups">
       <saml:AttributeValue>engineering</saml:AttributeValue>
       <saml:AttributeValue>elasticsearch-admins</saml:AttributeValue>
       <saml:AttributeValue>employees</saml:AttributeValue>
    </saml:Attribute>

but we get

    <saml:Attribute NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"
       Name="http://idp.example.org/attributes/groups" FriendlyName="groups">
       <saml:AttributeValue>engineering,elasticsearch-admins,employees</saml:AttributeValue>
    </saml:Attribute>

In order to help detect these cases, this commit changes the
`toString()` on `SamlAttribute` to include the length (e.g. `(len=1)`)
at the end

Relates: #84379, #102769
2025-07-16 05:44:09 +02:00
Julian Kiryakov f0c30f272d
Add support for RLIKE (LIST) with pushdown (#129929)
Adds support for RLIKE function alternative syntax with a list of patterns.
Examples:

ROW message = "foobar"
| WHERE message RLIKE ("foo.*", "bar.")
The new syntax is documented as part of the existing RLIKE function documentation. We will use the existing RLike java implementation for existing cases using the old syntax and one list argument case to improve mixed cluster compatibility.
The RLikeList is pushed down as a single Automaton to improve performance.
2025-07-15 14:39:11 -04:00
Fang Xing 04ae5275be
[ES|QL] Substitute date_trunc with round_to when the pre-calculated rounding points are available (#128639)
* consolidate min/max in SearchStats and substitue date_trunc/bucket with round_to
2025-07-15 11:55:58 -04:00
Nhat Nguyen c2fa78fda1
Speed up reading multivalued keywords (#131061)
This change speeds up reading multi-valued keyword fields by leveraging ordinals.

Before:
```
Benchmark                              (layout)          (name)  Mode  Cnt     Score    Error  Units
ValuesSourceReaderBenchmark.benchmark  in_order      keyword_mv  avgt    7   318.332 ±  1.660  ns/op
```

After:
```
Benchmark                              (layout)          (name)  Mode  Cnt     Score    Error  Units
ValuesSourceReaderBenchmark.benchmark  in_order      keyword_mv  avgt    7    96.659 ±  0.932  ns/op
```
2025-07-15 08:50:27 -07:00
Carlos Delgado f1ddd4c312
ESQL: dense_vector cosine similarity function (#130641) 2025-07-15 14:49:25 +02:00
Tommaso Teofili fd037bd846
retain the scores of portions of an ES|QL query, via a score function (#127551) 2025-07-15 11:55:57 +02:00
elasticsearchmachine b3b83cc7e9
Update docs for v9.0.4 release (#131144) 2025-07-15 09:37:34 +02:00
David Turner 6f5579656c
Improve lost-increment message in repo analysis (#131200)
Today repository analysis may fail with a message like the following:

    [test-repo] register [test-register-contended-F_NNXHrSSDGveoeyj1skwg]
    should have value [10] but instead had value
    [OptionalBytesReference[00 00 00 00 00 00 00 09]]

This is confusing because one might interpret `should have value [10]`
as an indication that Elasticsearch definitely wrote this value to the
register, leaving you trying to work out how that particular write was
lost. In fact it can be more subtle than that, we only believe the
register blob should have this value because we know we completed 10
supposedly-atomic increment operations, and the failure could instead be
that these operations are not as atomic as they need to be and that one
or more of the increments was lost.

This commit makes the message more verbose, clarifying that this failure
could be an atomicity problem rather than a simple lost write:

    [test-repo] Successfully completed all [10] atomic increments of
    register [test-register-contended-F_NNXHrSSDGveoeyj1skwg] so its
    expected value is [OptionalBytesReference[00 00 00 00 00 00 00 0a]],
    but reading its value with [getRegister] unexpectedly yielded
    [OptionalBytesReference[00 00 00 00 00 00 00 09]]. This anomaly may
    indicate an atomicity failure amongst concurrent
    compare-and-exchange operations on registers in this repository.
2025-07-15 08:32:49 +01:00
Pat Whelan 63b753f396
[ML] Sync Inference with Trained Model stats (#130544)
When the Trained Model stats are read, either during `GET _inference` or
`PUT _inference`, the Inference stats are updated to reflected the
Trained Model stats.

Fix #130339
2025-07-14 21:40:26 +02:00
David Turner 636b9bbc89
Promote headings in general arch guide (#130816)
Everything was a subheading of `REST and Transport Layers` but most of
these things should be sibling headings. This commit brings the
incorrectly-nested headings up a level to fix the document structure.

Relates ES-9874
2025-07-14 20:24:05 +01:00
Jonathan Buttner 2478b5c108
[ML] Including max_tokens through the Service API for Anthropic (#131113)
* Adding max_tokens to get services API

* Update docs/changelog/131113.yaml
2025-07-14 10:24:27 -04:00
Julian Kiryakov cbd88cdf73
[main]Prepare Index Like fix for backport to 9.1 and 8.19 (#130947)
Add transport versions to prepare for backport of #130849 to 9.1 and 8.19
2025-07-14 09:59:37 -04:00