The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.
This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.
With Lucene 10, IndexWriter requires a parent document field in order to
use index sorting with document blocks. This lead to different IAE and file
leaks in this test which are fixed by adapting the corresponding location in
the test setup.
This PR introduces a fix for the `fields` and `stored_fields` APIs and the way `_ignored_source` field is handled:
1. **Return `_ignored_source` as a top-level array metadata field**:
- The `_ignored_source` field is now returned as a top-level array in the metadata as done with other metadata fields.
2. **Return `_ignored_source` as an array of values**:
- Even when there is only a single ignored field, `_ignored_source` will now be returned as an array of values. This is done to be consistent with how the `_ignored` field is returned.
Without this fix, we would return the `_ignored_source` field twice, as a top-level field and as part of the `fields` array. Also, without this fix, we would only return a single value instead of all ignored field values.
This change introduces a new index mode, lookup, for indices intended
for lookup operations in ES|QL. Lookup indices must have a single shard
and be replicated to all data nodes by default. Aside from these
requirements, they function as standard indices. Documentation will be
added later when the lookup operator in ES|QL is implemented.
Since metadata storage was moved to Lucene in #50907 (7.16.0), we shouldn't encounter any on-disk global metadata files, so we can remove support for loading them.
The `_tier` metadata field was not used on the coordinator when
rewriting queries in order to exclude shards that don't match. This lead
to queries in the following form to continue to report failures even
though the only unavailable shards were in the tier that was excluded
from search (frozen tier in this example):
```
POST testing/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"_tier": "data_frozen"
}
}
]
}
}
}
```
This PR addresses this by having the queries that can execute on `_tier`
(term, match, query string, simple query string, prefix, wildcard)
execute a coordinator rewrite to exclude the indices that don't match
the `_tier` query **before** attempting to reach to the shards (shards,
that might not be available and raise errors).
Fixes#114910
Long GC disruption relies on Thread.resume, which is removed in JDK 23.
Tests that use it predate more modern disruption tests. This commit
removes gc disruption and the master disruption tests. Note that tests
relying on this scheme have already not been running since JDK 20 first
deprecated Thread.resume.
If source is required by a block loader then the StoredFieldsSpec that gets populated should be enhanced by SourceLoader#requiredStoredFields(...) in ValuesSourceReaderOperator. Otherwise in case of synthetic source many stored fields aren't loaded, which causes only a subset of _source to be synthesized. For example when unmapped fields exist or field values that exceed configured ignore above will not appear is _source.
This happens when field types fallback to a block loader implementation that uses _source. The required field values are then extracted from the source once loaded.
This change also reverts the production code changes introduced via #114903. That change only ensured that _ignored_source field was added to the required list of stored fields. In reality more fields could be required. This change is better fix, since it handles also other cases and the SourceLoader implementation indicates which stored fields are needed.
Closes#115076
All request handlers should be able to read `BytesTransportRequest` to
a class than can copied by re-serializing. Direct copying was only
necessary by the legacy `JOIN_VALIDATE_ACTION_NAME` request handler.
See #89926
The Inference usage API calls GET _inference/_all and because the default
configs are persisted on read it causes the creation of the .inference index.
This action is undesirable and causes test failures by leaking the system index
out of the test clean up code.
The most relevant ES changes that upgrading to Lucene 10 requires are:
- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area
Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.
This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.
Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
Until we can figure out where in the tests the index is being created,
temporarily ignore deleting it along with the other system indices.
Relates #114748
Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.
As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings
Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.
* Add data stream template validation
to snapshot restore
* Add data stream template validation
to data stream promotion endpoint
* Add new assertion for response headers
Add a new assertion to synchronously execute a request and check the
response contains a specific warning header
* Test for warning header on snapshot restore
When missing templates
* Test for promotion warnings
* Add documentation for the potential error states
* PR changes
* Spotless reformatting
* Add logic to look in snapshot global metadata
This checks if the snapshot contains a matching template for the DS
* Comment on test cleanup to explain it was copied
* Removed cluster service field
The ST_DISTANCE function added in #108764 was optimized for lucene pushdown in a series of followup PRs, but this did not include sorting by distance. Now this is resolved, for two key scenarios, both known to be valued by users:
* Sorting by distance:
`FROM index | EVAL distance=ST_DISTANCE(field, literal) | SORT distance`
* Sorting and filtering by distance:
`FROM index | EVAL distance=ST_DISTANCE(field, literal) | WHERE distance < literal | SORT distance`
The key changes required to make this work:
* Add to the EsQueryExec the appropriate sort->_geo_distance sort type
* Enhance PushTopNToSource to understand how to pushdown the sort even when there is an EVAL in between the FROM and the SORT (between the TopNExec and the EsQueryExec in the physical plan).
* Enhance PushFiltersToSource to understand how to pushdown the filter even when there is an EVAL in between the FROM and the WHERE (between the Filter and the EsQueryExec in the physical plan).
A useful bonus feature of this additional EVAL intelligence is that other, non-spatial cases are now also pushed down. In particular EVALs that are simple aliases are considered and pushed down, for both filtering and sorting.
Local benchmark results, very approximate, but show massive improvements for distanceSort and distanceFilterSort, which relate to the two cases listed above.
Benchmark Query DSL ESQL before this PR ESQL after this PR Comments
distanceFilter 10 5 5 Optimized in #109972
distanceEvalFilter 10 10000 1500 Still slow due to unnecessary EVAL
distanceSort 150 12000 160
distanceFilterSort 20 10000 24
NOTE: This enables pushing down sorting by any ReferenceAttribute that either refers to a sortable FieldAttribute, or to an StDistance function that itself refers to a suitable FieldAttribute of geo_point type.
---------
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
`ThreadContext#stashContext` doesn't guarantee to give a clean thread
context, but it's important we don't allow the callers' thread contexts
to leak into the cluster state update. This commit captures the desired
thread context at startup rather than using `stashContext` when forking
the processor.
Delay construction of `Warnings` until they are needed to save memory
when evaluating many many many expressions. Most expressions won't use
warnings at all and there isn't any need to make registering warnings
super duper fast. So let's make the construction lazy to save a little
memory. It's like 200 bytes per expression which isn't much, but it's
possible to have thousands of expressions in a single query. Abusive,
but possible.
This also consolidates all `Warnings` usages to a single `Warnings`
class. We had two. We don't need two.
Today when we reboot a node in a test case derived from
`AbstractCoordinatorTestCase` we lose the contents of
`blackholedRegisterOperations`, but it's important that these operations
_eventually_ run. With this commit we copy these operations over into
the new node.
Track shard snapshot progress during shutdown to identify any
bottlenecks that cause slowness that can ultimately block shard
re-allocation.
Relates ES-9086
randomInstantBetween can produce a result which is not within the [minInstant, maxInstant] range. This occurs when the epoch second picked matches the min bound and the nanos are below the min nanos, or the second picked matches the max bound seconds and nanos are above the max bound nanos. This change fixes the function by setting a bound on which nano values can be picked if the min or max epoch second value is picked.