Commit Graph

16902 Commits

Author SHA1 Message Date
Albert Zaharovits 72b3343ed3
Fix ThreadPoolMergeExecutorServiceDiskSpaceTests testUnavailableBudgetBlocksNewMergeTasksFromStartingExecution (#130001)
The test might produce over-budget tasks that cannot run even if all the
other tasks that were blocked (and hold up budget) while running
complete. Rather than prevent submitting such over-budget tasks, this
fix simply sets the merge task's queue available budget to
`Long.MAX_VALUE`, in order to ensure that all merge tasks run before the
test ends.

Fixes https://github.com/elastic/elasticsearch/issues/129148
2025-06-26 04:46:43 +10:00
Albert Zaharovits 98a6354ad4
Fix ThreadPoolMergeExecutorServiceDiskSpaceTests testAbortingOrRunningMergeTaskHoldsUpBudget (#129979)
When there are multiple FS with the same available disk space but different total sizes, it's unpredictable (and irrelevant) which one the checker uses.

Fixes #129823
2025-06-25 12:06:10 +03:00
Panagiotis Bailis f095b3c592
Fix for DenseVectorFieldMapperTests to properly initialize random vector given the dimensions in mappings (#129912) 2025-06-25 19:05:11 +10:00
Ievgen Degtiarenko 56d5009924
Add query plans to profile output (#128828) 2025-06-25 10:50:04 +02:00
Tim Vernum 8b62a55f2f
Watch SSL files instead of directories (#129738)
With the introduction of entitlements (#120243) and exclusive file
access (#123087) it is no longer safe to watch a whole directory.

In a lot of deployments, the parent directory for SSL config files
will be the main config directory, which also contains exclusive files
such as SAML realm metadata or File realm users. Watching that
directory will cause entitlement warnings because it is not
permissible for core/ssl-config to read files that are exclusively
owned by the security module (or other modules)
2025-06-25 18:24:57 +10:00
Yang Wang e1c930f8c1
Make RepositoriesService project-aware (#129821)
This PR makes RepositoriesService project aware so that the basic Put,
Get, Delete and Verify repository actions are now project scoped. 

It intentionally leaves the following aspects out of scope for the
current changes: * Repository stats reporting * Repository clean-up,
analysis and integrity verification * Repository usages for searchable
snapshots and CCR

They will be worked on separately. One main reason for leaving them out
is that they are not needed by OBS which is currently blocked by
repository/snapshot changes. They may also have their own complexities,
e.g. stats reporting.

Resolves: ES-10478
2025-06-25 10:34:34 +10:00
David Kyle 3a1551e0ef
[ML] Move to the Cohere V2 API for new inference endpoints (#129884) 2025-06-25 07:51:05 +10:00
Brendan Cully 73b0a60a77
Revert "Dispatch ingest work to coordination thread pool (#129820)" (#129949)
This reverts commit 53dae7a3a2.
2025-06-24 14:38:50 -07:00
HYUNSANG HAN (한현상, Travis) d16271b78d
Add RemoveBlock API to allow `DELETE /{index}/_block/{block}` (#129128)
Introduces a new `RemoveBlock` API that complements the existing `AddBlock` API by allowing users to remove index blocks using `DELETE /{index}/_block/{block}`.

Resolves #128966

---------

Co-authored-by: Niels Bauman <nielsbauman@gmail.com>
2025-06-25 06:16:14 +10:00
Tim Grein 3b51dd568c
[EIS] Dense Text Embedding task type integration (#129847) 2025-06-24 21:38:16 +02:00
elasticsearchmachine ba50e26252 Bump versions after 8.18.3 release 2025-06-24 18:30:29 +00:00
elasticsearchmachine 7c13a1553e Bump versions after 9.0.3 release 2025-06-24 18:12:45 +00:00
elasticsearchmachine 8f1d593119 Bump versions after 8.17.8 release 2025-06-24 17:58:58 +00:00
David Turner ba103f1c24
Reverse disordered-version warning message (#129904)
The comment in `TransportHandshaker` indicates (correctly) that we emit
a warning when talking to a chronologically-newer-yet-numerically-older
version, but the wording of the warning message is inverted and says
that the remote is chronologically-older-yet-numerically-newer. This
commit straightens out the message to match the situation it is
describing.

Relates #123397
2025-06-24 18:30:11 +01:00
Alexey Ivanov 876c456ac1
Move per-project settings out of ProjectMetadata (#129068)
To better support project restoration after deletion, this change moves project Settings from ProjectMetadata to the new custom in the ClusterState. It also introduces a new transport version for cluster state serialization. Reserved cluster state for project settings remains within ProjectMetadata.

Note: In mixed-version multiproject clusters, this may cause existing settings for projects to temporarily disappear until all nodes have been upgraded and restarted.
2025-06-24 18:06:45 +01:00
Panagiotis Bailis 07f65e978a
Fixing race condition in DynamicMappingIT when checking for updates in mappings (#129931) 2025-06-25 02:31:09 +10:00
Keith Massey 0b58a53a98
Adding the ability to unset data stream settings (#129677) 2025-06-24 10:30:15 -05:00
Panagiotis Bailis b855266bd1
Make bbq_hnsw the default index option for dense-vector fields with more than 384 dimensions (#129825) 2025-06-24 12:20:16 +03:00
Niels Bauman 5ccb772468
Remove unused `BulkProcessor` (#129875)
The `BulkProcessor` and `BulkRequestHandler` classes were unused and
could thus be removed along with their test classes.
2025-06-24 10:02:02 +10:00
Brian Rothermich 0f39ff586c
Bring over merge metrics from stateless (#128617)
Relates to an effort to combine the merge schedulers from stateless and stateful. The stateless merge scheduler has MergeMetrics that we want in both stateless and stateful. This PR copies over the merge metrics from the stateless merge scheduler into the combined merge scheduler.

Relates ES-9687
2025-06-23 19:42:01 -04:00
Mark J. Hoy a671505c8a
Update sparse_vector field mapping to include default setting for token pruning (#129089)
* Initial checkin of refactored index_options code

* [CI] Auto commit changes from spotless

* initial unit testing

* complete unit tests; add yaml tests

* [CI] Auto commit changes from spotless

* register test feature for sparse vector

* Update docs/changelog/129089.yaml

* update changelog

* add docs

* explicit set default index_options if null

* [CI] Auto commit changes from spotless

* update yaml tests; update docs

* fix yaml tests

* readd auth for teardown

* only serialize index options if not default

* [CI] Auto commit changes from spotless

* serialization refactor; pass index version around

* [CI] Auto commit changes from spotless

* fix transport versions merge

* fix up docs

* [CI] Auto commit changes from spotless

* fix docs; add include_defaults unit and yaml test

* [CI] Auto commit changes from spotless

* override getIndexReaderManager for SemanticQueryBuilderTests

* [CI] Auto commit changes from spotless

* cleanup mapper/builder/tests; index vers. in type

still need to refactor / clean YAML tests

* [CI] Auto commit changes from spotless

* cleanups to mapper tests for clarity

* [CI] Auto commit changes from spotless

* move feature into mappers; fix yaml tests

* cleanups; add comments; remove redundant test

* [CI] Auto commit changes from spotless

* escape more periods in the YAML tests

* cleanup mapper and type tests

* [CI] Auto commit changes from spotless

* rename mapping for previous index test

* set explicit number of shards for yaml test

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
2025-06-24 08:21:32 +10:00
Pat Whelan aeb37189af
[ML] SageMaker Elastic Payload (#129413)
Send the Elastic API Payload to a SageMaker endpoint, and parse the
response as if it were an Elastic API response.

- SageMaker now supports all task types in the Elastic API format.
- Streaming is supported using the SageMaker client/server rpc,
  rather than SSE. Payloads must be in a complete and valid JSON
  structure.
- Task Settings can be used for additional passthrough settings, but
  they will not be saved alongside the model. Elastic cannot make
  guarantees on the structure or contents of this payload, so Elastic
  will treat it like the other input payloads and only allow them during
  inference.
2025-06-24 06:43:24 +10:00
Julian Kiryakov caae426cf7
Pushdown for LIKE (LIST) (#129557)
Improved performance of LIKE (LIST)  by pushing an Automaton to do the evaluation down to Lucine.
2025-06-23 14:35:09 -04:00
Ignacio Vera ffea6ca2bf
Introduce an int4 off-heap vector scorer (#129824)
* Introduce an int4 off-heap vector scorer

* iter

* Update server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>

---------

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2025-06-23 18:44:12 +02:00
Tim Brooks 53dae7a3a2
Dispatch ingest work to coordination thread pool (#129820)
The vast majority of ingest pipelines are light CPU
operations. We don't want these to be put behind IO work on the write
executor. Instead, execute these on the coordination pool.
2025-06-23 09:31:36 -06:00
Panagiotis Bailis 7d4bbcc4bb
Fix for RescoreKnnVectorQueryIT to ensure that BBQ_IVF format is enabled (#129830) 2025-06-23 17:57:31 +03:00
Keith Massey 2f3b2b39c5
Using the STREAMS_LOGS_SUPPORT_8_19 transport version (#129796)
* Using the STREAMS_LOGS_SUPPORT_8_19 transport version

* Update StreamsMetadata.java

Returning null from getMinimalSupportedVersion

* Return minimal supported version as 8.19 for metadata object to fix test fail

---------

Co-authored-by: Luke Whiting <luke.whiting@elastic.co>
2025-06-24 00:20:20 +10:00
Jan Kuipers a3dac7434b
TransportVersion for backporting ES|QL sample (#129831) 2025-06-23 15:28:14 +02:00
Ignacio Vera 72b488cfa9
[IVF] Improve the format of the tmp file written during merging (#129828)
This commit separe vector and docIds on the tmp file.
2025-06-23 14:44:00 +02:00
Chris Hegarty f1ea88e1e8
Port IndexVersions.UPGRADE_TO_LUCENE_9_12_2 to main (#129832)
This commit ports the IndexVersions.UPGRADE_TO_LUCENE_9_12_2 constant to the main branch.

This is required after the update of Lucene 9.12.2 in the 8.19 branch, see #129555.
2025-06-23 09:32:37 +01:00
Sam Xiao e3838a4b9c
Make GeoIp downloader multi-project aware (#128282)
This change makes the GeoIp persistent task executor/downloader multi-project aware. 
- the database downloader persistent task will be at the project level, meaning there will be a downloader instance per project
- persistent task id is prefixed with project id, namely `<project-id>/geoip-downloader` for cluster in MP mode
2025-06-23 15:07:40 +08:00
Martijn van Groningen 41f69810df
Force niofs for fdt tmp file read access when flushing stored fields (#129538)
Due to the way how stored fields get flushed when index sorting is active, it is possible that we encounter significant page cache faults when memory is scarce. In order to mitigate some of the slowness around this, we're planning to no longer mmap the fdt temp file. Initially behind a feature flag, to check for unforeseen side effects.

Typically using always mmap directory is better compared to noifs directory given there is a sufficient memory available to the OS for filesystem caching. However when that isn't the case, then indexing performance can vary a lot (often very slow). This is more true for files tmp files that stored fields create during flushing. These files exist for only a brief moment to sort stored fields in the order of the configured index sorting and are then removed. If these tmp files are mmapped there is risk to trash file system cache.

This change only avoids using mmap for the fdt tmp file. This the file that actually contains the data and can large compared to other files that get flushed. The fdm (metadata) and fdi (stored field index) remain being mmapped.
2025-06-23 07:46:00 +02:00
Ignacio Vera 5bec44ad58
Reduce data amplification in IVFVectorsWriter (#129698)
With this change we will create first the tmp file and the posting list and once the file is deleted we will 
merge the vectors on the vec file. Therefore we only have two copies of the vector at the same time.
2025-06-23 07:13:22 +02:00
Chris Hegarty 1255a64832
Upgrade to Lucene 10.2.2 (#129546)
This commit upgrades to Upgrade to Lucene 10.2.2.

With the release of 10.2.2, we no longer need to workaround the Lucene bug mentioned in 128671.
2025-06-22 13:37:22 +01:00
Parker Timmins 245dc0775a
Make flattened synthetic source concatenate object keys on scalar/object mismatch (#129600)
There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
2025-06-20 14:20:49 -05:00
Jonathan Buttner d9b34d43a5
[ML] Custom service add support for input_type, top_n, and return_documents (#129441)
* Making progress on different request parameters

* Working tests

* Adding custom service validator for rerank

* Fixing embedding bug

* Adding transport version check

* Fixing tests

* Fixing license header

* Fixing writeTo

* Moving file and removing commented code

* Fixing test

* Fixing tests

* Refactoring and tests

* Fixing test
2025-06-20 12:23:48 -04:00
Ignacio Vera 4ca96c199f
Introduce a vectorize soarDistance function (#129744)
This commit replaces the method #soarResidual with a method call #soarDistance which perfoms better for computing soar distances.
2025-06-20 16:23:50 +02:00
Iraklis Psaroudakis 2940f9a011
Accommodate hollow engine changes (#129535)
* Field infos calculation method inside Engine
* buildSeqNoStats as static public method

So it can be overriden in stateless if/as needed.

Relates ES-11457
2025-06-20 16:01:55 +03:00
Luke Whiting 18c1e55eb3
Reserve transport version for streams endpoint backport (#129753) 2025-06-20 20:57:35 +10:00
Moritz Mack 9b2ca99ca4
Tolerate incompatible versions with different build hash (#128589)
Tolerate incompatible versions with different build hash.
I'm keeping the serverless feature flag to not create warnings there.

Relates to ES-11869
2025-06-20 12:16:32 +02:00
Dimitris Rempapis bd4f5fee8d
ShardSearchStatsTests - add missing metrics and methods (#129311)
ShardSearchStatsTests - add missing metrics and methods - complete coverage
2025-06-20 11:32:52 +03:00
Nick Tindall f715f63137
Fix and unmute GetSnapshotsIT (#129741)
Closes: 129740
2025-06-20 18:29:47 +10:00
Ankita Kumar 9e19b85783
Metrics to account for time spent waiting for next chunk (#129469)
This PR addresses ES-12071.

We want to collect metrics for the time that is spent waiting for the next chunk of a bulk request. This can help with diagnosing high bulk latency in case the latency is attributable to external factors such as network connection.

Co-authored-by: Francisco Fernández Castaño <francisco.fernandez.castano@gmail.com>
2025-06-20 08:21:18 +02:00
Yang Wang 0e83b19de7
Remove obsolete Metadata BWC for repositories (#129685)
When migrating RepositoriesMetadata from cluster custom to project
custom (#125398), we needed temporary BWC handling for clusters running
on a version that is before this change but after the initial MP change.
Such a cluster can only exist in the serverless environment which has
progressed way past any applicable versions. Therefore we no longer need
the BWC handling and this PR removes it.

Relates: #125398
2025-06-20 11:34:48 +10:00
Brendan Cully 4ce06d1aa2
Add deleteByQuery to InternalEngine (#129679) 2025-06-19 15:40:38 -07:00
Mikhail Berezovskiy eeca493860
Move HTTP content aggregation from Netty into RestController (#129302) 2025-06-19 09:05:17 -07:00
Albert Zaharovits 083326e658
Threadpool merge executor does not block aborted merges (#129613)
This PR addresses a bug where aborted merges are blocked if there's
insufficient disk space.

Previously, the merge disk space estimation did not consider if the
operation has been aborted when/while it was enqueued for execution.
Consequently, aborted merges, for e.g. when closing a shard, were
blocked if their disk space estimation was exceeding the available disk
space threshold. In this case, the shard close operation would itself
block.

This fix estimates a disk space budget of `0` for aborted merges, and it
periodically checks if any enqueued merge tasks have been aborted (more
generally, it checks if the budget estimate for any merge tasks has
changed, and reorders the queue if so). This way aborted merges are
prioritized and are never blocked.

Closes https://github.com/elastic/elasticsearch/issues/129335
2025-06-20 00:51:13 +10:00
Ignacio Vera 22eb035a27
Clone IndexInput when creating MemorySegmentPostingsVisitor (#129690) 2025-06-19 13:13:00 +02:00
Luke Whiting 1ccf1c6806
Streams - Log's Enable, Disable and Status endpoints (#129474)
* Enable And Disable Endpoint

* Status Endpoint

* Integration Tests

* REST Spec

* REST Spec tests

* Some documentation

* Update docs/changelog/129474.yaml

* Fix failing security test

* PR Fixes

* PR Fixes - Add missing feature flag name to YAML spec

* PR Fixes - Fix support for timeout and master_timeout parameters

* PR Fixes - Make the REST handler validation happy with the new params

* Delete docs/changelog/129474.yaml

* PR Fixes - Switch to local metadata action type and improve request handling

* PR Fixes - Make enable / disable endpoint cancellable

* PR Fixes - Switch timeout param name for status endpoint

* PR Fixes - Switch timeout param name for status endpoint in spec

* PR Fixes - Enforce local only use for status action

* PR Fixes - Refactor StreamsMetadata into server

* PR Fixes - Add streams module to multi project YAML test suite

* PR Fixes - Add streams cluster module to multi project YAML test suite
2025-06-19 11:48:44 +01:00
Yang Wang 0932beb1f8
Remove obsolete Metadata.FORMAT field and usages (#129519)
The only production usage is for cleaning up all global state files. It
is replaced by directly calling the relevant method without creating the
FORAMT instance. Test only usages are either replaced by equivalent
method calls or dropped.

Relates: #114698
2025-06-19 15:38:33 +10:00