Commit Graph

502 Commits

Author SHA1 Message Date
Armin Braun cd609533bf
Fix duplicate strings in SearchHit serialization (#127180)
The map key is always the field name. We exploited this fact in the get results but not in
search hits, leading to a lot of duplicate strings in many heap dumps.
We could do much better here since the names are generally coming out of a know limited set of names,
but as a first step lets at least align the get- and search-responses and non-trivial amount of bytes
in a number of use-cases. Plus, having a single string instance is faster on lookup etc. and saves on CPU
also.
2025-04-22 22:43:27 +02:00
Rene Groeschke ba61f8c7f7
Update Gradle wrapper to 8.12 (#118683)
This updates the gradle wrapper to 8.12

We addressed deprecation warnings due to the update that includes:

- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
2024-12-30 15:34:24 +01:00
Luca Cavanna 8793248d6d
Remove unused code from PercolateQueryBuilder (#118791) 2024-12-17 11:22:20 +01:00
Carlos Delgado 59967727cf
kNN vector rescoring for quantized vectors (#116663) 2024-12-11 09:14:18 +01:00
Oleksandr Kolomiiets 2b8e4e727c
Migrate mapper-related modules to internal-*-rest-test (#117298) 2024-11-23 00:35:24 +00:00
Kostas Krikellas 4573ab8ec1
[TEST] Replace _source.mode with index.mapping.source.mode in integration tests - take 2 (#116072)
* Reapply "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069)

This reverts commit e8bf344a28.

* [TEST] Replace _source.mode with index.mapping.source.mode in integration tests

* add reason

* add reason

* spotless

* revert unneeded
2024-11-04 09:39:34 +02:00
Kostas Krikellas e8bf344a28
Revert "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069)
This reverts commit a360757968.
2024-11-01 10:53:08 +02:00
Kostas Krikellas a360757968
[TEST] Replace _source.mode with index.mapping.source.mode in integration tests (#115926)
* Replace _source.mode with index.mapping.source.mode in integration tests

* fix tests

* revert 40_source_mode_setting.yml
2024-11-01 09:46:06 +02:00
Luca Cavanna 8efd08b019
Upgrade to Lucene 10 (#114741)
The most relevant ES changes that upgrading to Lucene 10 requires are:

- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area

Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
2024-10-21 13:38:23 +02:00
David Turner 4434f841e2
Handle remaining refs to `RestApiVersion#V_7` (#114881)
Removes several more references to the now-unused `RestApiVersion#V_7`
constant, and decorates all remaining references with an `@UpdateForV9`
annotation so that they all have clear owners.
2024-10-18 22:11:13 +11:00
David Turner ef2260130d
Remove dead branches for v7 REST API (#114850)
In v9 the `getRestApiVersion()` method on `RestRequest`,
`XContentBuilder` and `XContentParser` can never return `V_7`, so we can
replace all the expressions of the form `$x$.getRestApiVersion() == V_7`
with `false`. This commit does that, and then refactors away the
resulting dead code using (largely) automated transformations.
2024-10-16 06:46:53 +01:00
Jake Landis 888188695a
Bump compatible rest api version to 9/8 (#113151)
This commit bumps the REST API version from 8 to 9. This effectively removes all support for REST API 
compatibility with version 7 (though earlier commits already chipped away at some v7 support).

This also enables REST API compatibility support for version 8, providing support for v8 compatibility headers, 
i.e. "application/vnd.elasticsearch+json;compatible-with=8" and no-op support (no errors) to accept v9 
compatibility headers i.e. "application/vnd.elasticsearch+json;compatible-with=9".

see additional context in the GH PR #113151
2024-09-26 14:52:05 -05:00
Mark Vieira a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Mark Vieira 4ce661cc48
Bump Elasticsearch version to 9.0.0 (#112570) 2024-09-11 09:40:11 -07:00
Kostas Krikellas f3bc281978
Refactor build params for FieldMapper, adding SourceKeepMode (#112455)
* Refactor build params for FieldMapper

* more mappers and tests

* more mappers

* more mappers

* spotless

* spotless

* stored by default

* Revert "stored by default"

This reverts commit bbd247d64b.

* restore storeIgnored

* sync

* list valid values for SourceKeepMode

* small refactoring

* spotless
2024-09-06 14:16:17 +03:00
Nhat Nguyen 1964be565c
Allow querying index_mode (#110676)
This change allows querying the `index.mode` setting via a new 
`_index_mode` metadata field, enabling APIs such as `field_caps` or
`resolve_indices` to target indices that are either time_series or logs
only. This approach avoids adding and handling a new parameter for
`index_mode` in these APIs. Both ES|QL and the `_search` API should also
work with this new field.
2024-07-10 16:45:11 -07:00
Mayya Sharipova 405e39660b
Support k parameter for knn query (#110233)
Introduce an optional k param for knn query

If k is not set, knn query has the previous behaviour:
- `num_candidates` docs  is collected from each shard. This `num_candidates` docs
are used for combining with results with other queries and aggregations on each shard.
- docs from all shards are merged to produce the top global `size` results

If k is set, the behaviour instead is following:
- `k` docs is collected from each shard. This `k` docs are used for
combining results with other queries and aggregations on each shard.
- similarly, docs from all shards are merged to produce the top global `size`
results.

Having `k` param makes it more intuitive for users to address their needs.
They also don't need to care and can skip `num_candidates` param for this query
as it is of more internal details to tune how knn search operates.

Closes #108473
2024-06-28 09:59:28 -04:00
Luca Cavanna 915e4a50c5
Rename Mapper#name to Mapper#fullPath (#110040)
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while the MappedFieldType name does.

We have renamed Mapper.Builder#name to leafName (#109971) and Mapper#simpleName to leafName (#110030). This commit renames Mapper#name to fullPath for clarity
This required some adjustments in FieldAliasMapper to avoid confusion between the existing path method and fullPath. I renamed path to targetPath for clarity.
ObjectMapper already had a fullPath method that returned name, and was effectively a copy of name, so it could be removed.
2024-06-21 22:47:27 +02:00
Luca Cavanna 54e7b4d93b
Rename Mapper#simpleName to Mapper#leafName (#110030)
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does. We have method called simpleName to signal that, but leafName signals that more clearly and aligns with
the name we have recently introduced in Mapper.Builder (renamed from name to leafName).

Relates to #109971
2024-06-21 14:28:36 +02:00
Luca Cavanna 15c7abe111
Rename Mapper#name to Mapper#leafName (#109971)
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does.
2024-06-21 11:48:17 +02:00
Oleksandr Kolomiiets 1080425a65
Enable fallback synthetic source by default (#109370) 2024-06-07 09:21:22 -07:00
Panagiotis Bailis 1c3b3d8f11
Adding support for explain in rrf (#108682) 2024-06-07 11:09:06 +03:00
Oleksandr Kolomiiets 75b5efede4
Binary field enables doc values by default for index mode with synthetic source (#107739)
Binary field enables doc values by default for index mode with synthetic source
2024-04-23 08:24:47 -07:00
Mayya Sharipova 965ebab631
Percolator named queries: rewrite for matched info (#107432)
PR #103084 introduced an ability to return matched_queries during percolate
process for all percolator queries containing `_name` field.

But there was a bug with complex queries, as they were not rewritten before
obraining their Weight function. This fixes the bug by ensuring all
queries are first rewritten.

Closes #107176
2024-04-12 13:44:50 -04:00
Moritz Mack 1f5e04b721
Migrate YAML REST tests to synthetic cluster feature check (#107068)
To simplify the migration away from version based skip checks in YAML specs, 
this PR adds a synthetic version feature `gte_vX.Y.Z` for any version at or before 8.14.0.

New test specs for 8.14 or later are expected to use respective new cluster features,
or a test-only feature supplied via ESRestTestCase#createAdditionalFeatureSpecifications
if sufficient.
2024-04-11 18:22:38 +02:00
Felix Barnsteiner ab52ef1f06
Fix merging component templates with a mix of dotted and nested object mapper definitions (#106077)
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2024-04-08 17:55:41 +02:00
Felix Barnsteiner dee0be589c
Flatten object mappings when subobjects is false (#103542) 2024-02-22 11:43:12 +01:00
Felix Barnsteiner 5920c917aa
Encapsulate Mapper.Builder#name and make it private (#105648)
This is in preparation to make the field mutable,
which is needed in the context of https://github.com/elastic/elasticsearch/pull/103542
2024-02-20 15:53:14 +01:00
Armin Braun 73a68409c2
Ref count search response bytes (#103763)
Final step in  #102030 ... actually makes `SearchHit` read a releasable bytes reference.
Does still fallback to copying to unrolled buffers here and there which can be removed in follow-ups where it's worth the effort (aggs being the most important one probably).

Hard to create very reliable benchmarks for this because all our macro-benchmarks are quite noisy. Running http logs and PMC though, there's a statistically significant reduction in GC and reduced tail latencies in most benchmarks.

The overhead for ref-counting these bytes isn't visible in profiling as far as I can tell and for large source values, no corresponding large `byte[]` are created any longer outside of the few remaining spots where we copy to pooled buffers.

closes #102657
closes #102030
2024-01-17 16:16:39 +01:00
Armin Braun 80a95087db
Fix more search response leaks (#103956)
Some more mechanical fixing of leaked SearchResponse instances.
2024-01-05 10:40:59 +01:00
Mayya Sharipova b014843078
Return matched_queries in Percolator (#103084)
Return matched_queries for named queries in Percolator.

In a response, each hit together with
a `_percolator_document_slot` field will contain
`_percolator_document_slot_<slotNumber>_matched_queries` fields that will show
which sub-queries matched each percolated document.

Closes #10163
2023-12-11 09:07:26 -05:00
Ignacio Vera 32a8c683f9
Use ElasticsearchAssertions#asserResponse for `MultiSearchResponse` (#102694) 2023-11-28 13:52:26 +01:00
Armin Braun 2bd0a709fe
Fix ref counts for MultiSearchResponse not released in tests (#102604)
Part of the effort to fix search response leaks is to fix these. Fixed
all that I could easily find in tests. Production changes incoming once
the dependencies for those are fixed.

part of #102030 but no fancy utility here like for search responses
since we don't have so many use cases an none of them are tricky.
2023-11-24 12:51:40 -05:00
Armin Braun cdc83ad29b
Add shorthand for `prepareIndex` to test infrastructure (#101187)
Same as #101175, shorten `client().prepareIndex(index)` and
`client().prepareIndex().setIndex(index)` via a test utility.
Saves lots of code now and sets up some follow-up simplifcations.
2023-11-23 15:47:36 +01:00
Armin Braun a9c286b25c
Collapse verbose .execute().actionGet() calls in tests (#102502)
Cleaning this up a little even though it's still quite horrible.
`.get()` in this API actually means `actionGet()` so to speak.
I think a good first step to cleaning this up is to at least reduce
the duplication though and save 1k lines.
2023-11-23 10:10:10 +01:00
Armin Braun e03b0a5329
Add Leak Tracking to the SearchContext implementations (#102274)
Another step towards ref counting search hits. This adds leak tracking to the search context. Required 2 fixes in the production code to not fail tests: sub aggregations need to be closed eventually, found it easiest to just tie this to the parent context. If we throw in the constructor of the context (we have tests for this case), we should release/close it still (it's just impossible to fix the leak tracking otherwise, also it seems to me that this is more correct anyway since we initialise resources in that constructor).
Other than that, just trivial test changes to make sure the contexts get closed everywhere.
2023-11-16 15:34:12 +01:00
Panagiotis Bailis 8f108ec9e9
Removing explicit SearchResponse usages in tests - v3 (#102019)
Tests covered in this PR:

* `org.elasticsearch.percolator.PercolatorQuerySearchIT`
2023-11-13 07:33:30 -05:00
Mayya Sharipova 61c7483fc9
Make knn search a query (#98916)
This introduced a new knn query:
- knn query is executed during the Query phase similar to all other queries.
- No k parameter, k defaults to  size
- num_candidates is a size of queue for candidates to consider while
  search a graph on each shard
- For aggregations: "size" results are collected with total = size * shards.
   Aggregations will see size * shards results.
- All filters from DSL are applied as post-filters, except: 1) alias filter
 is applied as  pre-filter or 2) a filter provided as a parameter
 inside knn query.
2023-11-01 14:21:40 -04:00
Luca Cavanna b07feb507d
Percolator to support parsing script score query with params (#101051)
While dot expansion is disabled when parsing percolator queries at index
time, as that would interfere with query parsing,  we still use a wrapper parser
that is conservative about what methods it supports, assuming that
document parsing needs nextToken and not much more. Turns out that when
parsing queries instead, we need to support all the XContentParser
methods including map, list etc.

This commit adds a test for script score query parsing through document
parsing via percolator field mapper, and removes the limitations in the
wrapper parser when dots expansion is disabled.
2023-10-24 11:03:28 +02:00
David Turner 9794c6e205
Use ESIntegTestCase#prepareSearch more (#101179)
The refactoring in #101175 only covered all the one-arg call sites. This
PR does the rest.
2023-10-20 18:33:00 +01:00
David Turner 1eda6ac74b
Extract ESIntegTestCase#prepareSearch (#101175)
Relates #101172
2023-10-20 06:18:58 -04:00
Ryan Ernst 8a1db8c6c3
Move index version constants to IndexVersions (#101094)
Similar to the TransportVersions holder class, IndexVersions is the new
place to contain all constants for IndexVersion. This commit moves all
existing constants to the new class. It is purely mechanical.
2023-10-19 20:44:51 -04:00
Armin Braun 03ea4bbe6e
Remove more explicit references to SearchResponse in tests (#101052)
Follow up to #100966 introducing new combined assertion `assertSearchHitsWithoutFailures`
to combine no-failure, count, and id assertions into one block.
2023-10-18 20:27:52 +02:00
Armin Braun bae6991fb3
Remove ~600 references to SearchResponse in tests (#100966)
We'd like to make `SearchResponse` reference counted and pooled but there are around 6k
instances of tests that create a `SearchResponse` local variable that would need to be
released manually to avoid leaks in the tests.
This does away with about 10% of these spots by adding an override for `assertHitCount`
that handles the actual execution of the search request and its release automatically
and making use of it in all spots where the `.get()` on the request build could be inlined
semi-automatically and in a straight-forward fashion without other code changes.
2023-10-17 15:43:36 +02:00
Armin Braun b7eafce32c
Make some practically static methods static (#97565)
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
2023-10-06 23:37:07 +02:00
Alan Woodward 4e1fb3fca5
Automatically disable `ignore_malformed` on datastream `@timestamp` fields (#99346)
Data-stream mappings require a @timestamp field to be present and configured
as a date with a specific set of parameters. The index-wide setting of
ignore_malformed can cause problems here if it is set to true, because it needs
to be false for the @timestamp field.

This commit detects if a set of mappings is configured for a datastream by checking
for the presence of a DataStreamTimestampFieldMapper metadata field, and passes
that information on during Mapper construction as part of the MapperBuilderContext.
DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp
field, and if it is, sets ignore_malformed to false.

Relates to #96051
2023-09-13 15:02:22 +01:00
Armin Braun f1a376c317
Remove CopyTo.Builder (#99368)
The copyTo builder is really hard to reason about when it comes to
mapper merging, because the `reset` method would actually mutate an
existing mapper. That seems dangerous and the whole thing is quite
inefficient as well. -> this PR just removes it and uses a copy
constructor for copy on write, avoiding instance creation on mapper
merges here and there and leaving no doubt about these things being
immutable.
2023-09-08 13:24:31 -04:00
Ryan Ernst 19257125b1
Move transport version constants to TransportVersions (#97990)
Constants for TransportVersion currently live alongeside the class
definition. This has been fine since there was only one set of
constants. However, to support serverless, some constants will need to
be defined elsewhere.

This commit moves the existing constants to a new holder class,
TransportVersions. It is almost entirely mechanical, using IntelliJ move
members. The only non mechanical part was slightly shifting how CURRENT
is found, defining a LATEST in TransportVersions that is automatically
calculated (since we already have it, no need to manually define it).
2023-09-06 15:14:41 -04:00
David Turner 1e9c7f1d95
Align collection de/serialization API naming (#99150)
The `StreamOutput` and `StreamInput` APIs are designed so that code
which serializes objects to the transport protocol aligns closely with
the corresponding deserialization code. However today
`StreamOutput#writeCollection` pairs up with a variety of methods on
`StreamInput`, including `readList`, `readSet`, and so on. These methods
are not obviously compatible with `writeCollection` unless you look at
the implementation, and that makes verifying transport protocol code
harder than it needs to be.

This commit renames these methods to `readCollectionAsList`,
`readCollectionAsSet`, and so on, to clarify that they are compatible
with `writeCollection`.

Relates
https://github.com/elastic/elasticsearch/pull/98971#issuecomment-1697289815
2023-09-04 06:46:54 -04:00
Benjamin Trent d09cb767a9
Fix percolator query for stored queries that expand on wildcard field names (#98878)
An optimization introduced in:
https://github.com/elastic/elasticsearch/pull/81985 changed percolator
query behavior.

Users can specify a percolator query which expands fields based on a
wildcard pattern. Just one example is `simple_query_string`, which
allows field names like `"text_*"`. The user expects that this field
name will expand to relevant mapped fields (e.g. "text_foo"). However,
if there are no documents indexed in those fields at the time when the
percolator query is indexed, it doesn't expand to the relevant fields.

Additionally at query time, we may skip expanding fields and not match
the relevant mapped fields if they are considered "empty" (e.g. has no
values in the shard). We should instead allow expansion by indicating
that the field may exist in the shard.

closes: https://github.com/elastic/elasticsearch/issues/98819
2023-08-28 09:19:28 -04:00