This introduces an optimization to pushdown to Lucense those language constructs that aim at case-insensitive regular expression matching, used with `LIKE` and `RLIKE` operators, such as:
* `| WHERE TO_LOWER(field) LIKE "abc*"`
* `| WHERE TO_UPPER(field) RLIKE "ABC.*"`
These are now pushed as case-insensitive `wildcard` and `regexp` respectively queries down to Lucene.
Closes#127479
* Fix scoring and sort handling in pinned retriever
* Remove books.es from version control and add to .gitignore
* Remove books.es from version control and add to .gitignore
* Remove books.es entries from .gitignore
* fixed the mess
* Update x-pack/plugin/search-business-rules/src/test/java/org/elasticsearch/xpack/searchbusinessrules/retriever/PinnedRetrieverBuilderTests.java
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update x-pack/plugin/search-business-rules/src/main/java/org/elasticsearch/xpack/searchbusinessrules/retriever/PinnedRetrieverBuilder.java
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Currently, the Limit operator does not combine small pages into larger
ones; it simply passes them along, except for chunking pages larger than
the limit. This change integrates EstimatesRowSize into Limit and
adjusts it to emit larger pages. As a result, pages up to twice the
pageSize may be emitted, which is preferable to emitting undersized
pages. This should reduce the number of transport requests and responses
between clusters or coordinator-data nodes for queries without TopN or
STATS when target shards produce small pages due to their size or highly
selective filters.
* Inference changes
* Custom service fixes
* Update docs/changelog/127939.yaml
* Cleaning up from failed merge
* Fixing changelog
* [CI] Auto commit changes from spotless
* Fixing test
* Adding feature flag
* [CI] Auto commit changes from spotless
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* Implemented ChatCompletion task for Google VertexAI with Gemini Models
* changelog
* System Instruction bugfix
* Mapping role assistant -> model in vertex ai chat completion request for compatibility
* GoogleVertexAI chat completion using SSE events. Removed JsonArrayEventParser
* Removed buffer from GoogleVertexAiUnifiedStreamingProcessor
* Casting inference inputs with `castoTo`
* Registered GoogleVertexAiChatCompletionServiceSettings in InferenceNamedWriteablesProvider. Added InferenceSettingsTests
* Changed transport version to 8_19 for vertexai chatcompletion
* Fix to transport version. Moved ML_INFERENCE_VERTEXAI_CHATCOMPLETION_ADDED to the right location
* VertexAI Chat completion request entity jsonStringToMap using `ensureExpectedToken`
* Fixed TransportVersions. Left vertexAi chat completion 8_19 and added new one for ML_INFERENCE_VERTEXAI_CHATCOMPLETION_ADDDED
* Refactor switch statements by if-else for older java compatibility. Improved indentation via `{}`
* Removed GoogleVertexAiChatCompletionResponseEntity and refactored code around it.
* Removed redundant test `testUnifiedCompletionInfer_WithGoogleVertexAiModel`
* Returning whole body when fail to parse response from VertexAI
* Refactor use GenericRequestManager instead of GoogleVertexAiCompletionRequestManager
* Refactored to constructorArg for mandatory args in GoogleVertexAiUnifiedStreamingProcessor
* Changed transport version in GoogleVertexAiChatCompletionServiceSettings
* Bugfix in tool calling with role tool
* GoogleVertexAiModel added documentation info on rateLimitGroupingHash
* [CI] Auto commit changes from spotless
* Fix: using Locale.ROOT when calling toLowerCase
* Fix: Renamed test class to match convention & modified use of forbidden api
* Fix: Failing test in InferenceServicesIT
---------
Co-authored-by: lhoet <lhoet@google.com>
Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Fix and test off-heap stats when using direct IO for accessing the raw vectors. The direct IO reader is not using off-heap, so it returns an empty map to indicate that there is no off-heap requirements. I added some overloaded of tests with different directories to verify this.
Note: For 9.1 we're still using reflection to access the internals of non-ES readers, but DirectIO is an ES reader so we can use our internal OffHeapStats interface (rather than reflection). This is all replaced when we eventually get Lucene 10.3.
Begins adding support for running "tagged queries" to the compute
engine. Here, it's just the `LuceneSourceOperator` because that's
useful and contained.
Example time! Say you are running:
```
FROM foo
| STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000)
```
It's *often* faster to run this as four queries:
* The docs that round to `0`
* The docs that round to `100`
* The docs that round to `1000`
* The docs that round to `100000`
This creates an ESQL operator that can run these queries, one after the
other and attach those tags.
Aggs uses this trick and it's *way* faster when it can push down count
queries, but it's still faster when it pushes doc loading things. This
implementation in `LuceneSourceOperator` is quite similar to the doc
loading version in _search.
I don't have performance measurements yet because I haven't plugged this
into the language. In _search we call this `filter-by-filter` and enable
it when each group averages to more than 5000 documents and when there
isn't an `_doc_count` field. It's faster in those cases not to push. I
expect we'll be pretty similar.
Most providers write the UnifiedCompletionRequest JSON as we received
it, with some exception:
- the modelId can be null and/or overwritten from various locations
- `max_completion_tokens` repalced `max_tokens`, but some providers
still use the deprecated field name
We will handle the variations using Params, otherwise all of the
XContent building code has moved into UnifiedCompletionRequest so it can
be reused across providers.
This PR addresses the bug reported in
[#127496](https://github.com/elastic/elasticsearch/issues/127496)
**Changes:** - Added validation logic in `ConfigurableClusterPrivileges`
to ensure privileges defined for a global cluster manage role privilege
are valid - Added unit test to `ManageRolePrivilegesTest` to ensure
invalid privilege is caught during role creation - Updated
`BulkPutRoleRestIT` to assert that an error is thrown and that the role
is not created.
Both existing and new unit/integration tests passed locally.
writerWithOffset uses a lambda to create a RangeMissingHandler however,
the RangeMissingHandler interface has a default implementation for `sharedInputStreamFactory`.
This makes `writerWithOffset` delegate to the received writer only for the `fillCacheRange`
method where the writer itself perhaps didn't have the `sharedInputStream` method invoked
(always invoking `sharedInputStream` before `fillCacheRange` is part of the contract of the
RangeMissingHandler interface)
This PR makes `writerWithOffset` delegate the `sharedInputStream` to the underlying writer.
This moves half of the remaining centralized expression mapper logic
into the individual expressions. This is how all but 3 of the remaining
expressions work. Let's try and be fully consistent.
"elser" is an alias for "elasticsearch", and "sagemaker" is an alias for
"amazon_sagemaker".
Users can continue to create and use providers by their alias.
Elasticsearch will continue to support the alias when it reads the
configuration from the internal index.
Speeds up the semantic_text tests by using a test inference plugin. That
skips downloading the normal inference setup.
Closes#128511Closes#128513Closes#128571Closes#128572 Closes
#128573Closes#128574
This change skips indexing points for the seq_no field in tsdb and
logsdb indices to reduce storage requirements and improve indexing
throughput. Although this optimization could be applied to all new
indices, it is limited to tsdb and logsdb, where seq_no usage is
expected to be limited and storage requirements are more critical.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
SageMaker now supports Completion and Chat Completion using the OpenAI
interfaces.
Additionally:
- Fixed bug related to timeouts being nullable, default to 30s timeout
- Exposed existing OpenAi request/response parsing logic for reuse