This erroneously claimed that the example used a `drop` processor
(which drops whole documents) when it actually uses a `remove`
processor (which removes fields).
This speeds up loading from stored fields by opting more blocks into the
"sequential" strategy. This really kicks in when loading stored fields
like `text`. And when you need less than 100% of documents, but more than,
say, 10%. This is most useful when you need 99.9% of field documents.
That sort of thing. Here's the perf numbers:
```
%100.0 {"took": 403 -> 401,"documents_found":1000000}
%099.9 {"took":3990 -> 436,"documents_found": 999000}
%099.0 {"took":4069 -> 440,"documents_found": 990000}
%090.0 {"took":3468 -> 421,"documents_found": 900000}
%030.0 {"took":1213 -> 152,"documents_found": 300000}
%020.0 {"took": 766 -> 104,"documents_found": 200000}
%010.0 {"took": 397 -> 55,"documents_found": 100000}
%009.0 {"took": 352 -> 375,"documents_found": 90000}
%008.0 {"took": 304 -> 317,"documents_found": 80000}
%007.0 {"took": 273 -> 287,"documents_found": 70000}
%005.0 {"took": 199 -> 204,"documents_found": 50000}
%001.0 {"took": 46 -> 46,"documents_found": 10000}
```
Let's explain this with an example. First, jump to `main` and load a
million documents:
```
rm -f /tmp/bulk
for a in {1..1000}; do
echo '{"index":{}}' >> /tmp/bulk
echo '{"text":"text '$(printf %04d $a)'"}' >> /tmp/bulk
done
curl -s -uelastic:password -HContent-Type:application/json -XDELETE localhost:9200/test
for a in {1..1000}; do
echo -n $a:
curl -s -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_bulk?pretty --data-binary @/tmp/bulk | grep errors
done
curl -s -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_forcemerge?max_num_segments=1
curl -s -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_refresh
echo
```
Now query them all. Run this a few times until it's stable:
```
echo -n "%100.0 "
curl -s -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_query?pretty' -d'{
"query": "FROM test | STATS SUM(LENGTH(text))",
"pragma": {
"data_partitioning": "shard"
}
}' | jq -c '{took, documents_found}'
```
Now fetch 99.9% of documents:
```
echo -n "%099.9 "
curl -s -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_query?pretty' -d'{
"query": "FROM test | WHERE NOT text.keyword IN (\"text 0998\") | STATS SUM(LENGTH(text))",
"pragma": {
"data_partitioning": "shard"
}
}' | jq -c '{took, documents_found}'
```
This should spit out something like:
```
%100.0 { "took":403,"documents_found":1000000}
%099.9 {"took":4098, "documents_found":999000}
```
We're loading *fewer* documents but it's slower! What in the world?!
If you dig into the profile you'll see that it's value loading:
```
$ curl -s -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_query?pretty' -d'{
"query": "FROM test | STATS SUM(LENGTH(text))",
"pragma": {
"data_partitioning": "shard"
},
"profile": true
}' | jq '.profile.drivers[].operators[] | select(.operator | contains("ValuesSourceReaderOperator"))'
{
"operator": "ValuesSourceReaderOperator[fields = [text]]",
"status": {
"readers_built": {
"stored_fields[requires_source:true, fields:0, sequential: true]": 222,
"text:column_at_a_time:null": 222,
"text:row_stride:BlockSourceReader.Bytes": 1
},
"values_loaded": 1000000,
"process_nanos": 370687157,
"pages_processed": 222,
"rows_received": 1000000,
"rows_emitted": 1000000
}
}
$ curl -s -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_query?pretty' -d'{
"query": "FROM test | WHERE NOT text.keyword IN (\"text 0998\") | STATS SUM(LENGTH(text))",
"pragma": {
"data_partitioning": "shard"
},
"profile": true
}' | jq '.profile.drivers[].operators[] | select(.operator | contains("ValuesSourceReaderOperator"))'
{
"operator": "ValuesSourceReaderOperator[fields = [text]]",
"status": {
"readers_built": {
"stored_fields[requires_source:true, fields:0, sequential: false]": 222,
"text:column_at_a_time:null": 222,
"text:row_stride:BlockSourceReader.Bytes": 1
},
"values_loaded": 999000,
"process_nanos": 3965803793,
"pages_processed": 222,
"rows_received": 999000,
"rows_emitted": 999000
}
}
```
It jumps from 370ms to almost four seconds! Loading fewer values! The
second big difference is in the `stored_fields` marker. In the second on
it's `sequential: false` and in the first `sequential: true`.
`sequential: true` uses Lucene's "merge" stored fields reader instead of
the default one. It's much more optimized at decoding sequences of
documents.
Previously we only enabled this reader when loading compact sequences of
documents - when the entire block looks like
```
1, 2, 3, 4, 5, ... 1230, 1231
```
If there are any gaps we wouldn't enable it. That was a very
conservative thing we did long ago without doing any experiments. We
knew it was faster without any gaps, but not otherwise. It turns out
it's a lot faster in a lot more cases. I've measured it as faster for
99% gaps, at least on simple documents. I'm a bit worried that this is
too aggressive, so I've set made it configurable and made the default
being to use the "merge" loader with 10% gaps. So we'd use the merge
loader with a block like:
```
1, 11, 21, 31, ..., 1231, 1241
```
This does two things:
- It describes what the `timezone` option actually does. The existing wording is misleading.
- It recommends avoiding short abbreviations for timezones such as `PST`. This has come up at least twice recently.
* [DOCS][9.0] Improve ESQL reference docs IA
- reorganized es|ql reference documentation from flat list to logical hierarchy
- created three main sections: syntax reference , special fields, advanced operations
- renamed pages with more consistent and task-oriented titles
- aligned navigation titles with page content
- improved introductory text for each section
- used parallel phrasing for similar concepts
- clarified the relationship between reference docs and conceptual docs
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
- I trimmed the KEEP query in my final iteration in https://github.com/elastic/elasticsearch/pull/127215 but neglected to update the query itself, only the response. This fixes that so the query matches the response.
- 🚘 I also updated the table response to match other ESQL response tables
* [DOCS][ESQL] Cleanup and cross-reference LOOKUP JOIN reference and landing pages
**lookup-join.md (syntax reference)**:
- removed tip formatting for simpler direct link to landing page
- improved parameter formatting and descriptions
- fixed template variable from `{esql}` to `{{esql}}`
**esql-lookup-join.md (landing page)**:
- added "compare with enrich" section header
- simplified "how the command works" with clearer parameter explanation
- added code example in how it works section
- improved image alt text for accessibility
- organized example section with better context and SQL comparison
- added dropdown for sample tables to reduce visual clutter
- added "query" subheading for clearer organization
- included reference to additional examples in command reference
- removed excessive whitespace
* Improve example, add setup code
replaced abstract employee/language example with security monitoring use case
added setup instructions for creating test indices
included sample data loading via bulk api
new practical query example joining firewall logs with threat data
simplified results output showing threat detection scenario
added note about left-join behavior
improved code comments and structure
added required index.mode: lookup setting info
* Update elasticsearch-keystore.md
Customer needs document update for handling special characters and how we can use the echo command to enter the password.
* Update docs/reference/elasticsearch/command-line-tools/elasticsearch-keystore.md
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Update docs/reference/elasticsearch/command-line-tools/elasticsearch-keystore.md
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Update elasticsearch-keystore.md
Moving the section out of Examples as advised.
* Update docs/reference/elasticsearch/command-line-tools/elasticsearch-keystore.md
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Update docs/reference/elasticsearch/command-line-tools/elasticsearch-keystore.md
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
---------
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
While this change appears subtle at this point, I am using this in a later PR that adds a lot more spatial functions, where nesting them in related groups like this looks much better.
The main impact of this is that the On this page navigator on the right panel of the docs will show the nesting
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
The docs about the queue in a `fixed` pool are a little awkwardly
worded, and there is no mention of the queue in a `scaling` pool at all.
This commit cleans this area up.
* updating documentation to remove duplicate and redundant wording from 9.x
* Update links to rerank model landing page
---------
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Updating text_similarity_reranker documentation
* Updating docs to include urls
* remove extra THE from the text
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
The current LOOKUP JOIN docs include examples that are not tested by the ES|QL tests, unlike most other examples in the documentation. This PR fixes that, changing two examples to use existing tests, and adding a new csv-spec file for the remaining four examples. These four are not required to show results, so the tests have empty data and do not require any results. This means we are testing only the syntax (parsing and semantic analysis), which is sufficient for the docs.
* ES|QL change point docs
* Move ES|QL change_point to tech preview
* Update docs/reference/query-languages/esql/esql-commands.md
Co-authored-by: Craig Taverner <craig@amanzi.com>
* different example + add it the csv tests
* Restructure change_point docs to new structure
* Added generated test examples to change_point docs
* Fixed a few README.md text mistakes and added more details
* fix grammar
* License check
* regen parser
* Update docs/reference/query-languages/esql/_snippets/commands/layout/change_point.md
Co-authored-by: Craig Taverner <craig@amanzi.com>
---------
Co-authored-by: Craig Taverner <craig@amanzi.com>
Modifies TO_IP so it can handle leading `0`s in ipv4s. Here's how it
works now:
```
ROW ip = TO_IP("192.168.0.1") // OK!
ROW ip = TO_IP("192.168.010.1") // Fails
```
This adds
```
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "octal"})
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "decimal"})
```
We do this because there isn't a consensus on how to parse leading zeros
in ipv4s. The standard unix tools like `ping` and `ftp` interpret
leading zeros as octal. Java's built in ip parsing interprets them as
decimal. Because folks are using this for security rules we need to
support all the choices.
Closes#125460
This splits the grouping functions in two: those that can be evaluated independently through the EVAL operator (`BUCKET`) and those that don't (like those that that are evaluated through an agg operator, `CATEGORIZE`).
Closes#124608
While the internal structure of the docs is already split into many (over 1000) sub-pages, the final display for the `Functions and Operators` page is a single giant page, making navigation harder. This PR splits it into separate pages, one for each group of similar functions and one for the operators. Twelve new pages.
This PR also bundles a few other related changes. In total what is done is:
* Split functions/operators into 12 pages, one for each group, maintaining the existing split of each function/operator into a snippet with dynamically generated examples
* Split esql-commands.md into source-commands.md and processing-commands.md, each of which is split into individual snippets, one for each command
* Each command snippet has it's examples split out into separate files, if they were examples that were dynamically generated in the older asciidoc system
* The examples files are overwritten by the ES|QL unit tests, using a similar mechanism to the examples written for functions and operators)
* Some additional refinements to the Kibana definition and markdown files (nicer operator headings, and display text)
In the unexpected case that Elasticsearch dies due to a segfault or
other similar native issue, a core dump is useful in diagnosing the
problem. Yet core dumps are written to the working directory, which is
read-only for most installations of Elasticsearch. This commit changes
the working directory to the logs dir which should always be writeable.
This PR adds two new REST endpoints, for listing queries and getting information on a current query.
* Resolves#124827
* Related to #124828 (initial work)
Changes from the API specified in the above issues:
* The get API is pretty initial, as we don't have a way of fetching the memory used or number of rows processed.
List queries response:
```
GET /_query/queries
// returns for each of the running queries
// query_id, start_time, running_time, query
{ "queries" : {
"abc": {
"id": "abc",
"start_time_millis": 14585858875292,
"running_time_nanos": 762794,
"query": "FROM logs* | STATS BY hostname"
},
"4321": {
"id":"4321",
"start_time_millis": 14585858823573,
"running_time_nanos": 90231,
"query": "FROM orders | LOOKUP country_code ON country"
}
}
}
```
Get query response:
```
GET /_query/queries/abc
{
"id" : "abc",
"start_time_millis": 14585858875292,
"running_time_nanos": 762794,
"query": "FROM logs* | STATS BY hostname"
"coordinating_node": "oTUltX4IQMOUUVeiohTt8A"
"data_nodes" : [ "DwrYwfytxthse49X4", "i5msnbUyWlpe86e7"]
}
```
The `elasticsearch-certutil http` command, and security auto-configuration,
generate the HTTP certificate and CA without setting the `keyUsage` extension.
This PR fixes this by setting (by default):
- `keyCertSign` and `cRLSign` for self-signed CAs
- `digitalSignature` and `keyEncipherment` for HTTP certificates and CSRs
These defaults can be overridden when running `elasticsearch-certutil http`
command. The user will be prompted to change them as they wish.
For `elasticsearch-certutil ca`, the default value can be overridden by passing
the `--keysage` option, e.g.
```
elasticsearch-certutil ca --keyusage "digitalSignature,keyCertSign,cRLSign" -pem
```
Fixes#117769
With `geo_point` fields, here is the special case of values that have a syntactically valid format, but the numerical values for `latitude` and `longitude` are out of range.
If `ignore_malformed` is `false`, an exception will be thrown as usual. But if it is `true`, the document will be indexed correctly, by normalizing the latitude and longitude values into the valid range. The special `_ignored` field will not be set. The original source document will remain as before, but indexed values, doc-values and stored fields will all be normalized.
* test
* Revert "test"
This reverts commit 9f4e2adba0.
* Refactor InferenceService to allow passing in chunking settings
* Add chunking config to inference field metadata and store in semantic_text field
* Fix test compilation errors
* Hacking around trying to get ingest to work
* Debugging
* [CI] Auto commit changes from spotless
* POC works and update TODO to fix this
* [CI] Auto commit changes from spotless
* Refactor chunking settings from model settings to field inference request
* A bit of cleanup
* Revert a bunch of changes to try to narrow down what broke CI
* test
* Revert "test"
This reverts commit 9f4e2adba0.
* Fix InferenceFieldMetadataTest
* [CI] Auto commit changes from spotless
* Add chunking settings back in
* Update builder to use new map
* Fix compilation errors after merge
* Debugging tests
* debugging
* Cleanup
* Add yaml test
* Update tests
* Add chunking to test inference service
* Trying to get tests to work
* Shard bulk inference test never specifies chunking settings
* Fix test
* Always process batches in order
* Fix chunking in test inference service and yaml tests
* [CI] Auto commit changes from spotless
* Refactor - remove convenience method with default chunking settings
* Fix ShardBulkInferenceActionFilterTests
* Fix ElasticsearchInternalServiceTests
* Fix SemanticTextFieldMapperTests
* [CI] Auto commit changes from spotless
* Fix test data to fit within bounds
* Add additional yaml test cases
* Playing with xcontent parsing
* A little cleanup
* Update docs/changelog/121041.yaml
* Fix failures introduced by merge
* [CI] Auto commit changes from spotless
* Address PR feedback
* [CI] Auto commit changes from spotless
* Fix predicate in updated test
* Better handling of null/empty ChunkingSettings
* Update parsing settings
* Fix errors post merge
* PR feedback
* [CI] Auto commit changes from spotless
* PR feedback and fix Xcontent parsing for SemanticTextField
* Remove chunking settings check to use what's passed in from sender service
* Fix some tests
* Cleanup
* Test failure whack-a-mole
* Cleanup
* Refactor to handle memory optimized bulk shard inference actions - this is ugly but at least it compiles
* [CI] Auto commit changes from spotless
* Minor cleanup
* A bit more cleanup
* Spotless
* Revert change
* Update chunking setting update logic
* Go back to serializing maps
* Revert change to model settings - source still errors on missing model_id
* Fix updating chunking settings
* Look up model if null
* Fix test
* Work around https://github.com/elastic/elasticsearch/issues/125723 in semantic text field serialization
* Add BWC tests
* Add chunking_settings to docs
* Refactor/rename
* Address minor PR feedback
* Add test case for null update
* PR feedback - adjust refactor of chunked inputs
* Refactored AbstractTestInferenceService to return offsets instead of just Strings
* [CI] Auto commit changes from spotless
* Fix tests where chunk output was of size 3
* Update mappings per PR feedback
* PR Feedback
* Fix problems related to merge
* PR optimization
* Fix test
* Delete extra file
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Originally, `DATE_TRUNC` only supported 1-month and 3-month intervals for months, and 1-year interval for years, while arbitrary intervals were supported for weeks and days. This PR adds support for `DATE_TRUNC` with arbitrary month and year intervals.
Closes#120094
Hides some of the "extra" lines from ESQL's documentation. These lines
are required to make the documentation into nice tests which is
important to make sure the docs don't get out of date. But readers don't
need to see them.
In particular:
* Remove all links (both asciidoc and markdown) from the JSON definition files.
* This required a two phase edit, from asciidoc links to markdown, and then removal of markdown (replace with markdown text). This is because the asciidoc does not have the display text, and because some links were already markdown.
* Split predicates into is_null and is_not_null
* We kept the old combined version because the main docs still use that, so now we have both combined and separate versions, and Kibana can select the version they want.
This primarily splits the old preview:true warning from the newer applies_to approach. Since all of our current applies_to examples are actually just behaviour modifications of current functions, we do not use the official docs {applies_to} syntax. However there is code to make use of that in the case where we have an entirely new function which will appear in a new version.
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
This allows a `rescore_vector: {oversample: 0}` to indicate bypassing
oversampling and rescoring.
This is useful for:
- Updating a quantized mapping to turn off automatic rescoring
- Bypassing oversampling at query time in an ad-hoc manner if its on by default in the mapping
closes: https://github.com/elastic/elasticsearch/issues/125157
This PR was originally focused on improving support for Kibana docs, in particular the missing operator docs, but it has expanded to cover a bunch of related things:
* Primarily the main work was to improve operators support. ESQL generated docs cover all functions and most operators for which their is a clear operator class and test class. However, some are built-in behaviour and need additional support. This PR adds more generated content for those operators.
* Various specific operators requested by Kibana: Cast & null-predicates, and in particular the addition of examples
* Two functions without examples: mv_append and to_date_nanos
* Many small visual document cleanups (spelling, grammar, capitalization, etc.)
* Initial support for `applies_to` for multi-version differentiation.
This last point requires more work, as it is not yet agreed on just how we want this to look. We'll probably need to do refinements in followup PR. Consider the version in this PR as a first step into how this could look.
Adds a new cache and setting
TransportGetAllocationStatsAction.CACHE_TTL_SETTING
"cluster.routing.allocation.stats.cache.ttl" to configure the max age
for cached NodeAllocationStats on the master. The default
value is currently 1 minute per the suggestion in issue 110716.
Closes#110716
Did a few things:
* Rewrite Kibana docs asciidoc links to be MD links
* Make kibana docs links absolute to planned publication path
* Clarify which operators are generated and which are static
* Removed the trailing .md from kibana docs links
This commit adds a conversion function from numerics (and aggregate
metric doubles) to aggregate metric doubles.
It is most useful when you have multiple indices, where one index uses
aggregate metric double (e.g. a downsampled index) and another uses a
normal numeric type like long or double (e.g. an index prior to
downsampling).
Earlier work on the ES|QL port of docs to V3 introduced an issue in the build.gradle file making it fail with --configuration-cache. This fixes that, as well as one other broken link and removes some unused files.
In addition we bring back partial support for deleting unused files. It is tricky to have full support for this due to the mix of static and generated content, particularly in the operators snippets.
In a few previous PR's we restructured the ES|QL docs to make it possible to generate them dynamically.
This PR just moves a few files around to make the query languages docs easier to work with, and a little more organized like the ES|QL docs.
A bit part of this was setting up redirects to the new locations, so other repo's could correctly link to the elasticsearch docs.
Clarify that it is expected sometimes to see inter-node connections
sending zero-window advertisements as part of the usual TCP backpressure
mechanism.
This adds a new parameter to the quantized index mapping that allows
default oversampling and rescoring to occur.
This doesn't adjust any of the defaults. It allows it to be configured.
When the user provides `rescore_vector: {oversample: <number>}` in the
query it will overwrite it.
For example, here is how to use it with bbq:
```
PUT rescored_bbq
{
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"index_options": {
"type": "bbq_hnsw",
"rescore_vector": {"oversample": 3.0}
}
}
}
}
}
```
Then, when querying, it will auto oversample the `k` by `3x` and rerank
with the raw vectors.
```
POST _search
{
"knn": {
"query_vector": [...],
"field": "vector"
}
}
```
Building on the work started in https://github.com/elastic/elasticsearch/pull/123904, we now want to auto-generate most of the small subfiles from the ES|QL functions unit tests.
This work also investigates any remaining discrepancies between the original asciidoc version and the new markdown, and tries to minimize differences so the docs do not look too different.
The kibana json and markdown files are moved to a new location, and the operator docs are a little more generated than before (although still largely manual).
* Inline cast to date
* Update docs/changelog/123460.yaml
* New capability for `::date` casting
* More tests
* Update tests
---------
Co-authored-by: Fang Xing <155562079+fang-xing-esql@users.noreply.github.com>
Resolves#123053
This adds the thread name to the driver sleep profile output.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Adds options to QSTR function.
#118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL.
Closes#120933
The original work at https://github.com/elastic/elasticsearch/pull/106065 did not support geospatial types with this comment:
> I made this work for everything but geo_point and cartesian_point because I'm not 100% sure how to integrate with those. We can grab those in a follow up.
The geospatial types should be possible to collect using the VALUES aggregation with similar behavior to the `ST_COLLECT` OGC function, based on the Elasticsearch convention that treats multi-value geospatial fields as behaving similarly to any geometry collection. So this implementation is a trivial addition to the existing values types support.
* Allow data stream reindex tasks to be re-run after completion
* Docs update
* Update docs/reference/migration/apis/data-stream-reindex.asciidoc
Co-authored-by: Keith Massey <keith.massey@elastic.co>
---------
Co-authored-by: Keith Massey <keith.massey@elastic.co>
* Allow setting the `type` in the reroute processor
This allows configuring the `type` from within the ingest `reroute` processor. Similar to `dataset`
and `namespace`, the type defaults to the value extracted from the index name. This means that
documents sent to `logs-mysql.access.default` will have a default value of `logs` for the type.
Resolves#121553
* Update docs/changelog/122409.yaml
This updates the kibana signature json files in two ways:
* Renames `eval` to `scalar` - that's the name we use inside of ESQL and
we may as well make the name the same.
* Calls the `CATEGORIZE` and `BUCKET` function `grouping` because they
can only be used in the "grouping" positions of the `STATS` command.
Closes#113411
The semantic text format was updated in #119183. This commit removes the last remaining reference to the old format from the documentation to ensure consistency.
This commit introduces the `MappedFieldType#getDefaultHighlighter`, allowing a specific highlighter to be enforced for a field.
The semantic field mapper utilizes this new functionality to set the `semantic` highlighter as the default.
All other fields will continue to use the `unified` highlighter by default.
Today, Elasticsearch supports two models to establish secure connections
and trust between two Elasticsearch clusters:
- API key based security model
- Certificate based security model
This PR deprecates the _Certificate based security model_ in favour of *API key based security model*.
The _API key based security model_ is preferred way to configure remote clusters,
as it allows to follow security best practices when setting up remote cluster connections
and defining fine-grained access control.
Users are encouraged to migrate remote clusters from certificate to API key authentication.
* (Doc+) Expand watermark resolution
Relaunch https://github.com/elastic/elasticsearch/pull/116892 since the original one seems to be outdated and hard to update branch.
* Apply suggestions from code review
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
---------
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
API keys are high-entropy secure random strings. This means that the
additional work factor of functions like PBKDF or bcrypt are not
necessary, and a faster hash function like salted SHA-256 provides
adequate security against offline attacks (hash collision, brute force,
etc.).
This PR adds `SSHA-256` to the list of supported stored hash algorithms
for API key secrets, and makes it the default algorithm. Additionally,
this PR changes the format of API key secrets, moving from an encoded
UUID to a random string which increase the entropy of API keys from 122
bits to 128 bits, without changing overall secret length.
Relates: ES-9504
With the introduction of our new backing algorithm and making rescoring
easier with the `rescore_vector` API, let's mark bbq as GA.
Additionally, this commit adds rolling upgrade tests to ensure
stability.
Building off of `stats` and multi-value aggregations, including the
limitation:
- all values of extended_stats will be mapped to `double` if mapping
deduction is used
Relates #51925
Enable logsdb by default if logsdb.prior_logs_usage has not been set to true.
Meaning that if no data streams were created matching with the logs-- pattern in 8.x, then logsdb will be enabled by default for data streams matching with logs-*-* pattern.
Also removes LogsPatternUsageService as with version 9.0 and beyond, this component is no longer necessary.
Followup from #120708Closes#106489
* (Doc+) Clarify dimension field requirements for time_series aggregation
👋 howdy, team!
This PR adds a note explaining that time series indices require:
- index.mode set to "time_series"
- at least one dimension field with time_series_dimension: true
- a routing_path array listing those dimension fields
Without these settings, the time_series aggregation may return empty buckets or behave unexpectedly. By emphasizing the dimension field requirement, we help users configure their time series indices correctly and see meaningful aggregation results.
* Apply suggestions from code review
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
---------
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Add documentation for new REST endpoints related to data stream upgrade.
Endpoints:
- /_migration/reindex
- /_migration/reindex/{index}/_status
- /_migration/reindex/{index}/_cancel
- /_create_from/{source}/{dest}
There are many features of the Elasticsearch ecosystem that may malfunction, or fail to work entirely, if these templates are not installed. This commit adds documentation cautioning against disabling the installation of templates.
* [DOCS] Update documentation for index sorting and routing for logsdb
* update
* Apply suggestions from code review
Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>
* Update logs.asciidoc
* Update docs/reference/data-streams/logs.asciidoc
Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>
* Update logs.asciidoc
---------
Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>
Resolves#109999
This adds support for date nanos in the date diff function, as well as mixed nanos/millis use cases.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Add capability to stop async query on demand
The theory:
- User initiates async search request
- User sends the stop request (POST _query/async/<ID>/stop)
- If the async is finished by that time, it's like regular async get
- If it's not finished, the sinks are closed and the request is forcefully finished
* ESQL: Signatures for `NOT IN` et al
This generates signatures for `NOT IN`, `NOT LIKE`, and `NOT RLIKE`
using a small hack on top of the process used to generate the signatures
for `IN`, `LIKE`, and `RLIKE`. This is a very perl-worth hack, replacing
`LIKE` with `NOT LIKE` in the description. But it's useful for our
kibana friends and if we need to make it nicer we can do so later.
* Zap
Resolve/cluster allows querying for cluster-info-only (no index expression required)
This enhancement provides users with the ability to query the _resolve/cluster API endpoint without specifying
an index expression to match against. This allows users to quickly test what remote clusters are configured on
a cluster and whether they are available for querying.
The new endpoint takes no index expression:
```
GET _resolve/cluster
```
and returns the same information as before except for the "matching_indices" field. Example response:
```
{
"remote1": {
"connected": false,
"skip_unavailable": true
},
"remote2": {
"connected": true,
"skip_unavailable": false,
"version": {
"number": "8.17.0",
"build_flavor": "default",
"minimum_wire_compatibility_version": "7.17.0",
"minimum_index_compatibility_version": "7.0.0"
}
}
}
```
For backwards compatibility, this new endpoint works with clusters from older versions by querying with the index expression `dummy*` on those older clusters and ignoring the matching_indices value in the response they return.
This deprecated feature is being removed in 9.0, so the telemetry is
no longer needed.
The usage action is retained to support mixed v8/v9 clusters, with
annotations to remove in V10. But it is no longer registered in
`XPackUsageFeatureAction.ALL` and so the usage data is no longer
reported by `GET _xpack/usage`, and if invoked it always returns a
count of 0.
ES-9736 # comment Removed the telemetry in https://github.com/elastic/elasticsearch/pull/119890
Semantic text fields now support multi-fields, either as part of a multi-field structure or containing multi-fields internally.
This enhancement aligns with the semantic text field's current behavior as a standard text field.
Note: Multi-field support is only available for the new index format. Attempting to set a multi-field on an index created with the older format will still result in a failure.
There were different error cases with `ROUND(number, decimals)`:
- Decimals accepted unsigned longs, but threw a 500 with a `can't process [unsigned_long -> long]` in the cast evaluator
- Fixed by improving the `resolveType()`
- If the number was a BigInteger unsigned long, there were 2 cases throwing an exception:
1. Negative decimals outside the range of integer: Error
2. Negative decimals insie the range of integer, but "big enough" for `BigInteger.TEN.pow(...)` to throw a `BigInteger would overflow supported range`
3. -19 decimals with big unsigned longs like `18446744073709551615` was throwing an `unsigned_long overflow`
Also, when the number is a BigInteger and the decimals is a big negative (but not big enough to throw), it may be **very** slow. Taking _many_ seconds for a single computation (It tries to calculate a `10^(big number)`. I didn't do anything here, but I wonder if we should limit it.
To solve most of the cases, a warnExceptions was added for the overflow case, and a guard clause to return 0 for <-19 decimals on unsigned longs.
Another issue is that rounding to a number like 7 to -1 returns 0 instead of 10, which may be considered an error. But it's consistent, so I'm leaving it to another PR
This reduces the number of test cases in ESQL a little more ala #119678.
It migrates a few random tests and all of the multivalue functions:
```
92775 -> 43760
3m45 -> 4m04
```
This adds a few more error test cases that were missing to make sure it all
lines up well. And it fixes a few error messages in a few functions. That's
*likely* where the extra time goes.
* Added additional entries for troubleshooting unhealthy cluster
Reordered "Re-enable shard allocation" because not as common as other causes
Added additional causes of yellow statuses
Changed watermark commadn to include high and low watermark so users can make their cluster operate once again.
* Drive-by copyedit with suggestions for concision and some formatting fixes.
* Concision and some formatting fixes.
* Colon added
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Title change
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Spelling fix
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
---------
Co-authored-by: Kofi B <seanziee@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
This adds support for passing Date Nanos into the Date Format function. It works for both the single argument and two argument versions. Format strings are unchanged, as the same formatting logic works for both resolutions.
resolves#109994
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
I forgot to link the ToDateNanos docs when I merged that function.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This adds a sentence to `redirects.asciidoc` explaining what frozen
indices were - otherwise, everything will point to the message about
the unfreeze API having gone away, which is not very helpful. Some
cross-references are updated to point to this rather than to the
notice about the removal of the unfreeze API.
ES-9736 #comment Removed `_unfreeze` REST endpoint in https://github.com/elastic/elasticsearch/pull/119227
`fold` can be surprisingly heavy! The maximally efficient/paranoid thing
would be to fold each expression one time, in the constant folding rule,
and then store the result as a `Literal`. But this PR doesn't do that
because it's a big change. Instead, it creates the infrastructure for
tracking memory usage for folding as plugs it into as many places as
possible. That's not perfect, but it's better.
This infrastructure limit the allocations of fold similar to the
`CircuitBreaker` infrastructure we use for values, but it's different
in a critical way: you don't manually free any of the values. This is
important because the plan itself isn't `Releasable`, which is required
when using a real CircuitBreaker. We could have tried to make the plan
releasable, but that'd be a huge change.
Right now there's a single limit of 5% of heap per query. We create the
limit at the start of query planning and use it throughout planning.
There are about 40 places that don't yet use it. We should get them
plugged in as quick as we can manage. After that, we should look to the
maximally efficient/paranoid thing that I mentioned about waiting for
constant folding. That's an even bigger change, one I'm not equipped
to make on my own.
* Including examples
* Using js instead of json
* Adding unified docs to main page
* Adding missing description text
* Refactoring to remove unified route
* Addign back references to the _unified route
* Update docs/reference/inference/chat-completion-inference.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Address feedback
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
This wires up the randomized testing for DateFormat. Prior to this PR, none of the randomized testing was hitting the one parameter version of the function, so I wired that up as well. This required some compromises on the type signatures, see comments in line.less
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>