CPS S2D9: Explicitly refresh connection(s) to remote(s) before executing query.
Previously, we'd refresh connection(s) to remote only when skip_unavailable=false.
We now do it when operating under CPS context too. However, to prevent listening for
too long, we now listen for a short time -- the duration to wait is controlled by the
setting search.ccs.force_connect_timeout that we'd eventually inject for CPS env.
Allow LIKE LIST and RLIKE LIST to take advantage of ReplaceStringCasingWithInsensitiveRegexMatch for optimizations.
Add unit tests to confirm the change works and the results are correct.
The following is pushed down as just first_name RLIKE ("G.*")
FROM employees
| WHERE TO_UPPER(first_name) RLIKE ("G.*", "a.*", "bE.*")
The following is pushed down as just first_name LIKE ("G.*")
FROM employees
| WHERE TO_UPPER(TO_LOWER(TO_UPPER(first_name))) LIKE ("G.*", "a.*", "bE.*")
`SampleOperator.Status` wasn't declared as a NamedWritable by the plugin, leading to serialization errors when `SAMPLE` is used with `profile: true`.
It leads to an `IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.compute.operator.Operator$Status][sample]`
Profiles will be tested in this PR: https://github.com/elastic/elasticsearch/pull/131474, that's currently failing because of this bug
* Refactor Hugging Face service settings and completion request methods for consistency
* Add Llama model support for embeddings and chat completions
* Refactor Llama request classes to improve secret settings handling
* Refactor DeltaParser in LlamaStreamingProcessor to improve argument handling
* Enhance Llama streaming processing by adding support for nullable object arrays
* [CI] Auto commit changes from spotless
* Fix error messages in LlamaActionCreator
* [CI] Auto commit changes from spotless
* Add detailed Javadoc comments to Llama classes for improved documentation
* Enhance LlamaChatCompletionResponseHandler to support mid-stream error handling and improve error response parsing
* Add Javadoc comments to Llama classes for improved documentation and clarity
* Fix checkstyle
* Update LlamaEmbeddingsRequest to use mediaTypeWithoutParameters for content type header
* Add unit tests for LlamaActionCreator and related models
* Add unit tests for LlamaChatCompletionServiceSettings to validate configuration parsing and serialization
* Add unit tests for LlamaEmbeddingsServiceSettings to validate configuration parsing and serialization
* Add unit tests for LlamaEmbeddingsServiceSettings to validate various configuration scenarios
* Add unit tests for LlamaChatCompletionResponseHandler to validate error response handling
* Refactor Llama embedding and chat completion tests for consistency and clarity
* Add unit tests for LlamaChatCompletionRequestEntity to validate message serialization
* Add unit tests for LlamaEmbeddingsRequest to validate request creation and truncation behavior
* Add unit tests for LlamaEmbeddingsRequestEntity to validate XContent serialization
* Add unit tests for LlamaErrorResponse to validate error handling from HTTP responses
* Add unit tests for LlamaChatCompletionServiceSettings to validate configuration parsing and serialization
* Add tests for LlamaService request configuration validation and error handling
* Fix error message formatting in LlamaServiceTests for better localization support
* Refactor Llama model classes to implement accept method for action visitors
* Hide Llama service from configuration API to enhance security and reduce exposure
* Refactor Llama model classes to remove modelId and update embedding request handling
* Refactor Llama request classes to use pattern matching for secret settings
* Update embeddings handler to use HuggingFace response entity
* Refactor Mistral model classes to remove modelId and update rate limit hashing
* Refactor Mistral action classes to remove taskSettings parameter and streamline action creation
* Refactor Llama and Mistral models to remove taskSettings parameter and simplify model instantiation
* Refactor Llama service tests to use Model instead of CustomModel and update similarity measure to DOT_PRODUCT
* Remove unused tests and imports from LlamaServiceTests
* Add chunking settings support to Llama embeddings model tests
* Add changelog
* Add support for version checks in Llama settings and define new transport version
* Refactor Llama model assertions and remove unused version support methods
* Refactor Llama service constructors to include ClusterService and improve error message handling
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Record duration and errors when Inference Endpoints deploy Trained
Models. The new metric is `es.inference.trained_model.deployment.time`.
Refactored `InferenceStats` into server so it can be used in
`InferenceServiceExtension` and passed to InferenceServices rather than
remain at the Transport layer.
Fix https://github.com/elastic/elasticsearch/issues/129372
Due to how remote ENRICH is
[planned](32e50d0d94/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java (L93)),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.
We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.
For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.
There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](https://github.com/elastic/elasticsearch/issues/129372#issuecomment-3083024230)
and my draft https://github.com/elastic/elasticsearch/pull/131286 for
enabling this.)
This adds support for splitting `Page`s of large values when loading
from single segment, non-descending hits. This is hottest code path as
it's how we load data for aggregation. So! We had to make very very very
sure this doesn't slow down the fast path of loading doc values.
Caveat - this only defends against loading large values via the
row-by-row load mechanism that we use for stored fields and _source.
That covers the most common kinds of large values - mostly `text` and
geo fields. If we need to split further on docs values, we'll have to
invent something for them specifically. For now, just row-by-row.
This works by flipping the order in which we load row-by-row and
column-at-a-time values. Previously we loaded all column-at-a-time
values first because that was simpler. Then we loaded all of the
row-by-row values. Now we save the column-at-a-time values and instead
load row-by-row until the `Page`'s estimated size is larger than a "jumbo"
size which defaults to a megabyte.
Once we load enough rows that we estimate the page is "jumbo", we then
stop loading rows. The Page will look like this:
```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
| XXXX | | XXXX | | | <-- after loading this row
| | | | | | we crossed to "jumbo" size
| | | | | |
| | | | | |
| | | | | | <-- these rows are entirely empty
| | | | | |
| | | | | |
```
Then we chop the page to the last row:
```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
| XXXX | | XXXX | | |
```
Then fill in the column-at-a-time columns:
```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX | 1 | XXXX | 11 | 1.0 |
| XXXX | 2 | XXXX | 22 | -2.0 |
| XXXX | 3 | XXXX | 33 | 1e9 |
| XXXX | 4 | XXXX | 44 | 913 |
| XXXX | 5 | XXXX | 55 | 0.1234 |
| XXXX | 6 | XXXX | 66 | 3.1415 |
```
And then we return *that* `Page`. On the next `Driver` iteration we
start from where we left off.
Add verification that the optimizers do not modify the number of attributes and the attribute datatype.
We add special handling for Lookup Join, by checking EsQueryExec esQueryExec && esQueryExec.indexMode() == LOOKUP and another special handling for ProjectAwayColumns.ALL_FIELDS_PROJECTED
Closes#125576
When the Trained Model has been deployed through the Inference Endpoint
API, it can only be updated using the Inference Endpoint API.
When the Trained Model has been deployed and then attached to an
Inference Endpoint, it can only be updated using the Trained Model API.
Fix#129999
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>
* fix boosting for knn
* Fixing for match query
* fixing for match subquery
* fix for sparse vector query boost
* fix linting issues
* Update docs/changelog/129282.yaml
* update changelog
* Copy constructor with match query
* util function to create sparseVectorBuilder for sparse query
* util function for knn query to support boost
* adding unit tests for all intercepted query terms
* Adding yaml test for match,sparse, and knn
* Adding queryname support for nested query
* fix code styles
* Fix failed yaml tests
* Update docs/changelog/129282.yaml
* update yaml tests to expand test scenarios
* Updating knn to copy constructor
* adding yaml tests for multiple indices
* refactoring match query to adjust boost and queryname and move to copy constructor
* refactoring sparse query to adjust boost and queryname and move to copy constructor
* [CI] Auto commit changes from spotless
* Refactor sparse vector to adjust boost and queryname in the top level
* Refactor knn vector to adjust boost and queryname in the top level
* fix knn combined query
* fix unit tests
* fix lint issues
* remove unused code
* Update inference feature name
* Remove double boosting issue from match
* Fix double boosting in match test yaml file
* move to bool level for match semantic boost
* fix double boosting for sparse vector
* fix double boosting for sparse vector in yaml test
* fix knn combined query
* fix knn combined query
* fix sparse combined query
* fix knn yaml test for combined query
* refactoring unit tests
* linting
* fix match query unit test
* adding copy constructor for match query
* refactor copy match builder to intercepter
* [CI] Auto commit changes from spotless
* fix unit tests
* update yaml tests
* fix match yaml test
* fix yaml tests with 4 digits error margin
* unit tests are now more randomized
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* Add Azure AI Rerank support
* address comments
* address comments
* refactor azure ai studio service
* update rerank task settings test
* add provider for rerank
The new attribute generated by MV_EXPAND should remain in the original position. The projection added by ProjectAwayColumns does not respect the original order of attributes.
Make ProjectAwayColumns respect the order of attributes to fix this.
* Enable force inference endpoint deleting for invalid models and after stopping model deployment fails
* Update docs/changelog/129090.yaml
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This picks up the fix for the locale bug reported at
https://github.com/aws/aws-sdk-java-v2/issues/5968.
This reverts the patching of the library added in commit
2697a3a872, except for the test
enhancement.
Fix confirmed in e.g. locale `ar-ER` with
./gradlew ":plugins:discovery-ec2:test" \
--tests "org.elasticsearch.discovery.ec2.Ec2DiscoveryTests.testFilterByTags" \
-Dtests.seed=596874EED28A2B92 -Dtests.locale=ar-ER -Dtests.timezone=NET \
-Druntime.java=24
Sometimes SAML IdPs send what _should_ be a list of values as a single
comma-separated string.
That is, we expect something using SAML's multi-valued attribute
feature:
<saml:Attribute NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"
Name="http://idp.example.org/attributes/groups" FriendlyName="groups">
<saml:AttributeValue>engineering</saml:AttributeValue>
<saml:AttributeValue>elasticsearch-admins</saml:AttributeValue>
<saml:AttributeValue>employees</saml:AttributeValue>
</saml:Attribute>
but we get
<saml:Attribute NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"
Name="http://idp.example.org/attributes/groups" FriendlyName="groups">
<saml:AttributeValue>engineering,elasticsearch-admins,employees</saml:AttributeValue>
</saml:Attribute>
In order to help detect these cases, this commit changes the
`toString()` on `SamlAttribute` to include the length (e.g. `(len=1)`)
at the end
Relates: #84379, #102769
Adds support for RLIKE function alternative syntax with a list of patterns.
Examples:
ROW message = "foobar"
| WHERE message RLIKE ("foo.*", "bar.")
The new syntax is documented as part of the existing RLIKE function documentation. We will use the existing RLike java implementation for existing cases using the old syntax and one list argument case to improve mixed cluster compatibility.
The RLikeList is pushed down as a single Automaton to improve performance.
Today repository analysis may fail with a message like the following:
[test-repo] register [test-register-contended-F_NNXHrSSDGveoeyj1skwg]
should have value [10] but instead had value
[OptionalBytesReference[00 00 00 00 00 00 00 09]]
This is confusing because one might interpret `should have value [10]`
as an indication that Elasticsearch definitely wrote this value to the
register, leaving you trying to work out how that particular write was
lost. In fact it can be more subtle than that, we only believe the
register blob should have this value because we know we completed 10
supposedly-atomic increment operations, and the failure could instead be
that these operations are not as atomic as they need to be and that one
or more of the increments was lost.
This commit makes the message more verbose, clarifying that this failure
could be an atomicity problem rather than a simple lost write:
[test-repo] Successfully completed all [10] atomic increments of
register [test-register-contended-F_NNXHrSSDGveoeyj1skwg] so its
expected value is [OptionalBytesReference[00 00 00 00 00 00 00 0a]],
but reading its value with [getRegister] unexpectedly yielded
[OptionalBytesReference[00 00 00 00 00 00 00 09]]. This anomaly may
indicate an atomicity failure amongst concurrent
compare-and-exchange operations on registers in this repository.
When the Trained Model stats are read, either during `GET _inference` or
`PUT _inference`, the Inference stats are updated to reflected the
Trained Model stats.
Fix#130339
* Unmute tests muted in #129550 as they were fixed in #127322
* Return no match when no dimensions have been set
* Add tests
* Update docs/changelog/131081.yaml
The default 10s TLS handshake timeout may be too short if there is some
bug causing event-loop latency, and this has more serious consequences
than the underlying performance issue (e.g. it prevents the cluster from
scaling up to work around the problem). With this commit we expose a
setting that allows the timeout to be configured, providing a workaround
in such cases.
If a user tries to apply geo distance sorting to a field of the wrong type, they'll get
a 500 error that causes shard failures, because the field data impl gets casted to the
expected type without first checking that the field type is the one expected.
This commit addresses that by returning a 400 error instead.
Closes#129500
Remote lookup join implementation
This patch enables using LOOKUP JOIN with cross-cluster queries. Example:
FROM logs-*, remote:logs-* | LOOKUP JOIN clients on ip | SORT timestamp | LIMIT 100
Fixes _index LIKE <pattern> to always have normal text matching semantics.
Implement a generic ExpressionQuery and ExpressionQueryBuilder that can be serialized to the data node. Then the ExpressionQueryBuilder can build an Automaton using TranslationAware.asLuceneQuery() and execute it in Lucine.
Introduces a breaking change for LIKE on _index fields. The old like behavior is not correct and does not have normal like semantics from ESQL. Customers upgrading from old build to new build might see a regression, where the data changes due to the like filters on clustering produces different results, but the new results are correct.
Behavior for ESQL
New CCS to New => New behavior everywhere
Old CCS to New => Old behavior everywhere (the isForESQL flag is not passed in from old)
New CCS to Old => New behavior for new, old behavior for old (the isForESQL cannot be passed, old does not know about it).
Old CCS to Old => Old behavior everywhere
Closes#129511
This change modifies reindex behavior to always include vector fields, even if the target index omits embeddings from _source.
This prepares for scenarios where embeddings may be automatically excluded (#130382).
Add verification for LocalLogical plan
The verification is skipped if there is remote enrich, similar to how it is skipped for LocalPhysical plan optimization.
The skip only happens for LocalLogical and LocalPhysical plan optimizers.
This action solely needs the cluster state, it can run on any node.
Since this action is invoked across clusters, we need to be able to
(de)serialize requests and responses. We introduce a new
`RemoteClusterStateRequest` that wraps the existing
`ClusterStateRequest` and implements (de)serialization.