This adds a new parameter to the quantized index mapping that allows
default oversampling and rescoring to occur.
This doesn't adjust any of the defaults. It allows it to be configured.
When the user provides `rescore_vector: {oversample: <number>}` in the
query it will overwrite it.
For example, here is how to use it with bbq:
```
PUT rescored_bbq
{
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"index_options": {
"type": "bbq_hnsw",
"rescore_vector": {"oversample": 3.0}
}
}
}
}
}
```
Then, when querying, it will auto oversample the `k` by `3x` and rerank
with the raw vectors.
```
POST _search
{
"knn": {
"query_vector": [...],
"field": "vector"
}
}
```
Building on the work started in https://github.com/elastic/elasticsearch/pull/123904, we now want to auto-generate most of the small subfiles from the ES|QL functions unit tests.
This work also investigates any remaining discrepancies between the original asciidoc version and the new markdown, and tries to minimize differences so the docs do not look too different.
The kibana json and markdown files are moved to a new location, and the operator docs are a little more generated than before (although still largely manual).
* add max number of chunks
* wire merge function
* implement sparse merge function
* move tests to correct package/file
* float merge function
* bytes merge function
* more accurate byte average
* spotless
* Fix/improve EmbeddingRequestChunkerTests
* Remove TODO
* remove unnecessary field
* remove Chunk generic
* add TODO
* Remove specialized chunks
* add comment
* Update docs/changelog/123150.yaml
* update changelog
In #112380 we changed this `assert` to yield a `String` on failure
rather than the original `ElasticsearchException`, which means we don't
see the original completion's stack trace any more. This commit
reinstates the lost stack trace.
Just a random thing I noticed, this seemingly overlooked when
porting to the new rewrite API. No need to create a new searcher, we
already have one here.
This PR combines both cluster and project tasks under persistent_tasks
for XContent output of Metadata when it contains only a single project,
i.e. there will be no cluster_persistent_tasks in such output. This is
to maintain the existing output format when the cluster is not
multi-project enabled.
Relates: MP-1945
Checks the local cluster state after marking a shard snapshot as
complete, and logs a message if the completion is not reflected in this
cluster state.
In case when file with `.attach_pid` in name was stored in distribution
and then deleted, the exception could stop copying/linking files
without any sign of issue. The files were then missing in the cluster
used in the test causing them sometimes to fail (depending on which
files haven't been copied).
When using `Files.walk` it is impossible to catch the IOException and
continue walking through files conditionally. It has been replaced with
FileVisitor implementation to be able to continue if the exception is
caused by files left temporarily by JVM but no longer available.
This adds the file extentions for the blobs we request when populating the
cache.
The possible values for lucene extensions are around 50 and we use a special
"other" category for everything else, as a fallback.
This creates a `pragma` you can use to request that fields load from a
stored field rather than doc values. It implements that pragma for
`keyword` and number fields.
We expect that, for some disk configuration and some number of fields,
that it's faster to load those fields from _source or stored fields than
it is to use doc values. Our default is doc values and on my laptop it's
*always* faster to use doc values. But we don't ship my laptop to every
cluster.
This will let us experiment and debug slow queries by trying to load
fields a different way.
You access this pragma with:
```
curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{
"query": "FROM foo",
"pragma": {
"field_extract_preference": "STORED"
}
}'
```
On a release build you'll need to add `"accept_pragma_risks": true`.
This PR replaces the default project ID with the actual project while
waiting for active shards after index auto-creation similar to how wait
is done for explicit index creation.
* SEARCH-802 - bug fixed - Query rules allows for creation of rules with invalid match criteria
* [CI] Auto commit changes from spotless
* Worked on the comments given in the PR
* [CI] Auto commit changes from spotless
* Fixed Integration tests
* [CI] Auto commit changes from spotless
* Made changes from the PR
* Update docs/changelog/122823.yaml
* [CI] Auto commit changes from spotless
* Fixed the duplicate code issue in queryRuleTests
* Refactored code to clean it up based on PR comments
* [CI] Auto commit changes from spotless
* Logger statements were removed
* Cleaned up the QueryRule tests
* [CI] Auto commit changes from spotless
* Update x-pack/plugin/ent-search/src/test/java/org/elasticsearch/xpack/application/EnterpriseSearchModuleTestUtils.java
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
* [CI] Auto commit changes from spotless
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
This is intended to be used to efficiently calculate a write load
metric for use by the auto-sharding algorithm which favours more
recent loads.
ES-10037 #comment Core algorithm added in https://github.com/elastic/elasticsearch/pull/124507
When enabling FIPS `javax.net.ssl.trustStore` is not necessarily set.
This change adds FIPS entitlements based on
`org.bouncycastle.fips.approved_only=true`, which enforces usage of FIPS
approved functionality only.
Additionally, this PR grants read access to a custom trust store if
provided via `javax.net.ssl.trustStore`, otherwise read access to the
default JDK trust store is granted.
Relates to ES-11025.
Writing tests for #123861, turns out that #124195 is not enough.
We really need new IT test cases for "always allowed" actions: in order to be sure they are allowed, we need to setup the plugin with no policy.
This PR adds test cases for that, plus the support for writing test functions that accept one Environment parameter: many test paths we test and allow/deny are relative to paths in Environment, so it's useful to have access to it (see readAccessConfigDirectory as an example)
* [Profiling] Fix NullPointerExceptions by accepting dotted field names
Profiling uses synthetic source and thus expects nested field names in query responses.
With 8.17+, synthetic source is available only to Enterprise (or higher) subscriptions,
so that smaller subscriptions have dotted field names in query responses.
The profiling plugin relies on nested field names and runs into NullPointerExceptions
if these are not found.
This PR fixes the NullPointerExceptions that could happen with dotted field names.
Signed-off-by: Tim Rühsen <tim.ruhsen@elastic.co>
* Evaluate source only once (cleanup)
---------
Signed-off-by: Tim Rühsen <tim.ruhsen@elastic.co>
* Inline cast to date
* Update docs/changelog/123460.yaml
* New capability for `::date` casting
* More tests
* Update tests
---------
Co-authored-by: Fang Xing <155562079+fang-xing-esql@users.noreply.github.com>
* Refactor: findRequestingFrame
* INFO instead of WARN for NotEntitledException.
Some of these are expected, so an INFO seems more appropriate.
The stack trace tends to attract attention even when entitlements are not the
cause of a problem, so let's avoid the stack trace, but still include stack
frame info from the frame of interest.
* Use child loggers for Not Entitled logs
* Use warn, and include compoenent name
* Fix ALL_UNNAMED
* Mute entitlement warnings from repositories
* PR feedback
* Common out the Not Entitled prefix.
We're alerting on this, so let's not rely on every caller of notEntitled to remember it.