elasticsearch

Commit Graph

Author	SHA1	Message	Date
Nik Everett	c5e76847ad	ESQL: Keep ordinals in conversion functions (#125357 ) Make the conversion functions that process `BytesRef`s into `BytesRefs` keep the `OrdinalBytesRefVector`s when processing. Let's use `TO_LOWER` as an example. First, the performance numbers: ``` (operation) Mode Score Error -> Score Error Units to_lower 30.662 ± 6.163 -> 30.048 ± 0.479 ns/op to_lower_ords 30.773 ± 0.370 -> 0.025 ± 0.001 ns/op to_upper 33.552 ± 0.529 -> 35.775 ± 1.799 ns/op to_upper_ords 35.791 ± 0.658 -> 0.027 ± 0.001 ns/op ``` The test has a 8192 positions containing alternating `foo` and `bar`. Running `TO_LOWER` via ordinals is super duper faster. No longer `O(positions)` and now `O(unique_values)`. Let's paint some pictures! `OrdinalBytesRefVector` is a lookup table. Like this: ``` +-------+----------+ \| bytes \| ordinals \| \| ----- \| -------- \| \| FOO \| 0 \| \| BAR \| 1 \| \| BAZ \| 2 \| +-------+ 1 \| \| 1 \| \| 0 \| +----------+ ``` That lookup table is one block. When you read it you look up the `ordinal` and match it to the `bytes`. Previously `TO_LOWER` would process each value one at a time and make: ``` bytes ----- foo bar baz bar bar foo ``` So it'd run `TO_LOWER` once per `ordinal` and it'd make an ordinal non-lookup table. With this change `TO_LOWER` will now make: ``` +-------+----------+ \| bytes \| ordinals \| \| ----- \| -------- \| \| foo \| 0 \| \| bar \| 1 \| \| baz \| 2 \| +-------+ 1 \| \| 1 \| \| 0 \| +----------+ ``` We don't even have to copy the `ordinals` - we can reuse those from the input and just bump the reference count. That's why this goes from `O(positions)` to `O(unique_values)`.	2025-03-21 20:00:15 +02:00
Nik Everett	7ac6e5fd3c	ESQL: Fix EvalBenchmark (#124736 ) Fix the benchmark for `EVAL` which was failing because of a strange logging error. The benchmarks really didn't want to run when we use commons-logging. That's fine - we can use the ES logging facade thing. I also added a test to the benchmarks which should run the self-tests for `EVAL` on `gradle check`.	2025-03-14 20:19:20 +00:00
Simon Cooper	d7864f4af6	Refactor JMH script vector distance benchmark to add panama benchmarks (#124351 ) Add vector benchmarks vs scalar, and automatically pick up new implementations as they get added	2025-03-12 13:15:16 +00:00
Tim Vernum	f7e80e7fd2	Merge branch 'main' into feature/multi-project	2025-02-27 12:09:08 +11:00
Salvatore Campagna	86a6c93bd6	Benchmark date field range query with doc values sparse index (#123251 )	2025-02-26 16:50:36 +01:00
Niels Bauman	116b045139	Merge main into multi-project	2025-02-24 17:43:47 +01:00
Nik Everett	319e53a350	ESQL: Benchmark TO_LOWER and TO_UPPER (#123268 ) This adds a microbenchmark for TO_LOWER and TO_UPPER. They are quite common probably could use some optimizing.	2025-02-24 11:18:06 -05:00
Tim Vernum	21a16acbd4	Merge main into multi-project	2025-02-24 14:23:18 +11:00
Nik Everett	67293ba8f4	ESQL: Speed up VALUES for many buckets (#123073 ) Speeds up the VALUES agg when collecting from many buckets. Specifically, this speeds up the algorithm used to `finish` the aggregation. Most specifically, this makes the algorithm more tollerant to large numbers of groups being collected. The old algorithm was `O(n^2)` with the number of groups. The new one is `O(n)` ``` (groups) 1 219.683 ± 1.069 -> 223.477 ± 1.990 ms/op 1000 426.323 ± 75.963 -> 463.670 ± 7.275 ms/op 100000 36690.871 ± 4656.350 -> 7800.332 ± 2775.869 ms/op 200000 89422.113 ± 2972.606 -> 21920.288 ± 3427.962 ms/op 400000 timed out at 10 minutes -> 40051.524 ± 2011.706 ms/op ``` The `1` group version was not changed at all. That's just noise in the measurement. The small bump in the `1000` case is almost certainly worth it and real. The huge drop in the `100000` case is quite real.	2025-02-23 18:29:55 +00:00
Tim Vernum	680e7a6979	Merge revision `5c00341c2b` into multi-project	2025-02-14 17:17:41 +11:00
Oleksandr Kolomiiets	b8d7e99cb9	Use FallbackSyntheticSourceBlockLoader for number fields (#122280 )	2025-02-12 16:12:19 -08:00
Yang Wang	04d459009b	Merge main into multi-project	2025-02-12 09:57:09 +11:00
Iván Cea Fontenla	7bea3a5610	ESQL: Remove AggregateMapper reflection, and delegate intermediate state to suppliers (#122023 ) To avoid having AggregateMapper find aggregators based on their names with reflection, I'm doing some changes: - Make the suppliers have methods returning the intermediate states - To allow this, the suppliers constructor won't receive the chanells as params. Instead, its methods will ask for them - Most changes in this PR are because of this - After those changes, I'm leaving AggregateMapper still there, as it still converts AggregateFunctions to its NamedExpressions	2025-02-10 13:01:59 +01:00
Niels Bauman	621a18d947	Merge main into multi-project	2025-01-30 17:26:28 +10:00
Armin Braun	453db3fd71	Optimize InternalAggregations construction a little (#120868 ) We can streamline and optimize this logic a little to see less copying and more compact results.	2025-01-28 11:50:47 +01:00
Lorenzo Dematté	81a9348431	[Entitlements] Enable native access based on policies (#120638 )	2025-01-24 08:29:38 +01:00
Niels Bauman	6495dcbb40	Merge main into multi-project	2025-01-24 15:48:39 +10:00
Nik Everett	dc4fa26174	Speed up COALESCE significantly (#120139 ) ``` before after (operation) Score Error Score Error Units coalesce_2_noop 75.949 ± 3.961 -> 0.010 ± 0.001 ns/op 99.9% coalesce_2_eager 99.299 ± 6.959 -> 4.292 ± 0.227 ns/op 95.7% coalesce_2_lazy 113.118 ± 5.747 -> 26.746 ± 0.954 ns/op 76.4% ``` We tend to advise folks that "COALESCE is faster than CASE", but, as of 8.16.0/https://github.com/elastic/elasticsearch/pull/112295 that wasn't the true. I was working with someone a few days ago to port a scripted_metric aggregation to ESQL and we saw COALESCE taking ~60% of the time. That won't do. The trouble is that CASE and COALESCE have to be lazy, meaning that operations like: ``` COALESCE(a, 1 / b) ``` should never emit a warning if `a` is not `null`, even if `b` is `0`. In 8.16/https://github.com/elastic/elasticsearch/pull/112295 CASE grew an optimization where it could operate non-lazily if it was flagged as "safe". This brings a similar optimization to COALESCE, see it above as "case_2_eager", a 95.7% improvement. It also brings and arguably more important optimization - entire-block execution for COALESCE. The schort version is that, if the first parameter of COALESCE returns no nulls we can return it without doing anything lazily. There are a few more cases, but the upshot is that COALESCE is pretty much free in cases where long strings of results are `null` or not `null`. That's the `coalesce_2_noop` line. Finally, when there mixed null and non-null values we were using a single builder with some fairly inefficient paths. This specializes them per type and skips some slow null-checking where possible. That's the `coalesce_2_lazy` result, a more modest 76.4%. NOTE: These %s of improvements on COALESCE itself, or COALESCE with some load-overhead operators like `+`. If COALESCE isn't taking a ton time in your query don't get particularly excited about this. It's fun though. Closes #119953	2025-01-23 17:40:09 +00:00
Niels Bauman	682cf0a18f	Merge remote-tracking branch 'public/main' into merge-main	2025-01-23 13:27:52 +10:00
Lorenzo Dematté	d18b6790f4	[Entitlements] Refactor: create/parse entitlement policies earlier during bootstrap (#120611 )	2025-01-22 14:29:57 +01:00
Tim Vernum	552cec7ff0	Merge revision `34059c9dbd` into multi-project	2025-01-17 16:32:15 +11:00
Patrick Doyle	34059c9dbd	Limit ByteSizeUnit to 2 decimals (#120142 ) * Exhaustive testParseFractionalNumber * Refactor: encapsulate ByteSizeUnit constructor * Refactor: store size in bytes * Support up to 2 decimals in parsed ByteSizeValue * Fix test for rounding up with no warnings * ByteSizeUnit transport changes * Update docs/changelog/120142.yaml * Changelog details and impact * Fix change log breaking.area * Address PR comments	2025-01-16 19:30:23 +00:00
Simon Cooper	5a70623d8d	Merge remote-tracking branch 'upstream-main/main' into merge-main-16-01-25	2025-01-16 09:23:46 +00:00
Iván Cea Fontenla	b7ab8f8bb7	ESQL: Add row counts to profile results (#120134 ) Closes https://github.com/elastic/elasticsearch/issues/119969 - Rename "pages_in/out" to "pages_received/emitted", to standardize the name along most operators - There are still "pages_processed" operators, maybe it would make sense to also rename those? - Add "pages_received/emitted" to TopN operator, as it was missing that - Added "rows_received/emitted" to most operators - Added a test to ensure all operators with status provide those metrics	2025-01-15 15:30:41 +00:00
Nik Everett	c990377c95	ESQL: Limit memory usage of `fold` (#118602 ) `fold` can be surprisingly heavy! The maximally efficient/paranoid thing would be to fold each expression one time, in the constant folding rule, and then store the result as a `Literal`. But this PR doesn't do that because it's a big change. Instead, it creates the infrastructure for tracking memory usage for folding as plugs it into as many places as possible. That's not perfect, but it's better. This infrastructure limit the allocations of fold similar to the `CircuitBreaker` infrastructure we use for values, but it's different in a critical way: you don't manually free any of the values. This is important because the plan itself isn't `Releasable`, which is required when using a real CircuitBreaker. We could have tried to make the plan releasable, but that'd be a huge change. Right now there's a single limit of 5% of heap per query. We create the limit at the start of query planning and use it throughout planning. There are about 40 places that don't yet use it. We should get them plugged in as quick as we can manage. After that, we should look to the maximally efficient/paranoid thing that I mentioned about waiting for constant folding. That's an even bigger change, one I'm not equipped to make on my own.	2025-01-13 15:04:27 +00:00
Yang Wang	e1151ef1ba	Merge main into multi-project	2025-01-06 13:30:02 +11:00
Oleksandr Kolomiiets	8ca74fb956	Revert "Extract synthetic source logic from DocumentParser (#116049 )" (#119530 ) This reverts commit `e8d32afdf4`.	2025-01-03 12:03:40 -08:00
Tim Vernum	4ff691f066	Merge revision `7fb6ca447a` into multi-project	2024-12-31 15:41:02 +11:00
Oleksandr Kolomiiets	e8d32afdf4	Extract synthetic source logic from DocumentParser (#116049 )	2024-12-24 11:41:44 -08:00
Niels Bauman	3738202979	Merge main into multi-project	2024-12-24 18:26:13 +01:00
Chris Hegarty	3a2f8f62c4	Add square distance query variants to the vector distance benchmark (#119219 ) This commit adds square distance query variants to the vector distance benchmark.	2024-12-23 17:18:23 +00:00
Tim Vernum	e5a0739005	Merge main into multi-project	2024-12-12 17:23:24 +11:00
Jim Ferenczi	b40a52035f	Add Optional Source Filtering to Source Loaders (#113827 ) This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter. This update also modifies the get document API to apply source filters earlier—directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.	2024-12-11 13:17:19 +00:00
Yang Wang	92867cdf50	Merge main into multi-project	2024-11-29 08:50:54 +11:00
Jack Conradson	656b5f9480	Refactor PluginsLoader to better support tests (#117522 ) This refactors the way PluginsLoader is created to better support various types of testing.	2024-11-27 14:31:30 -08:00
Tim Vernum	192ed6c5a4	Merge main into multi-project	2024-11-21 11:25:11 +11:00
Jack Conradson	4f46924f36	Split plugin loading into two different phases to support entitlements (#116998 ) This change loads all the modules and creates the module layers for plugins prior to entitlement checking during the 2nd phase of bootstrap initialization. This will allow us to know what modules exist for both validation and checking prior to actually loading any plugin classes (in a follow up change). There are now two classes: PluginsLoader which does the module loading and layer creation PluginsService which uses a PluginsLoader to create the main plugin classes and start the plugins	2024-11-20 15:05:42 -08:00
Niels Bauman	0edb9fa778	Merge remote-tracking branch 'public/main' into merge-main # Conflicts: # server/src/main/java/org/elasticsearch/action/search/TransportSearchShardsAction.java # server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationStatsService.java # server/src/main/java/org/elasticsearch/gateway/GatewayMetaState.java # server/src/main/java/org/elasticsearch/plugins/Plugin.java # server/src/test/java/org/elasticsearch/gateway/GatewayMetaStateTests.java # server/src/test/java/org/elasticsearch/ingest/IngestMetadataTests.java	2024-11-18 10:53:12 +01:00
Rene Groeschke	13c8aaeffa	[Gradle] Remove static use of BuildParams (#115122 ) Static fields dont do well in Gradle with configuration cache enabled. - Use buildParams extension in build scripts - Keep BuildParams.ci for now for easy serverless migration - Tweak testing doc	2024-11-15 17:58:57 +01:00
Armin Braun	77a7c9c2e2	Add singleton for noop BitSetFilterCache.Listener (#116753 ) Noticed during a code review that added yet another one of these: We have quite a few instances of duplicate noop implementations, lets make tests a little less verbose here. Technically the constant is test-only but it felt right to just leave it on the interface.	2024-11-13 21:55:14 +01:00
Tim Vernum	17c27bc42b	Merge main into multi-project	2024-11-11 16:28:45 +11:00
Mary Gouseti	6c959b7e75	Add benchmark for IndexNameExpressionResolver (#115982 ) * Add benchmark for IndexNameExpressionResolver * Extract IndicesRequest in a local class * Added one more benchmark to capture a mixed request --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-11-08 11:26:11 +02:00
Tim Vernum	2ba2d2a995	Merge main into multi-project	2024-10-31 11:55:04 +11:00
Ryan Ernst	e5d5c17c99	Use directory name as project name for libs (#115720 ) The libs projects are configured to all begin with `elasticsearch-`. While this is desireable for the artifacts to contain this consistent prefix, it means the project names don't match up with their directories. Additionally, it creates complexities for subproject naming that must be manually adjusted. This commit adjusts the project names for those under libs to be their directory names. The resulting artifacts for these libs are kept the same, all beginning with `elasticsearch-`.	2024-10-29 13:02:28 -07:00
Tim Vernum	d4e4b5abb0	Merge main into multi-project	2024-10-22 13:03:12 +11:00
Luca Cavanna	8efd08b019	Upgrade to Lucene 10 (#114741 ) The most relevant ES changes that upgrading to Lucene 10 requires are: - use the appropriate IOContext - Scorer / ScorerSupplier breaking changes - Regex automaton are no longer determinized by default - minimize moved to test classes - introduce Elasticsearch900Codec - adjust slicing code according to the added support for intra-segment concurrency - disable intra-segment concurrency in tests - adjust accessor methods for many Lucene classes that became a record - adapt to breaking changes in the analysis area Co-authored-by: Christoph Büscher <christophbuescher@posteo.de> Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co> Co-authored-by: ChrisHegarty <chegar999@gmail.com> Co-authored-by: Brian Seeders <brian.seeders@elastic.co> Co-authored-by: Armin Braun <me@obrown.io> Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com> Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>	2024-10-21 13:38:23 +02:00
Tim Vernum	883471e3b2	Merge main into multi-project	2024-10-21 16:35:42 +11:00
Nhat Nguyen	d0c8ff5932	Refactor TSDB doc_values util allow introduce new codec (#115042 ) This PR refactors the doc_values utils used in the TSDB codec to allow sharing between the current codec and the new codec.	2024-10-18 08:01:04 -07:00
Tim Vernum	1c62e4f533	Merge main into multi-project	2024-10-14 16:30:28 +11:00
Nik Everett	e304c1d5c1	ESQL: Speed up grouping by bytes (#114021 ) This speeds up grouping by bytes valued fields (keyword, text, ip, and wildcard) when the input is an ordinal block: ``` bytes_refs 22.213 ± 0.322 -> 19.848 ± 0.205 ns/op (maybe real, maybe noise. still good) ordinal didn't exist -> 2.988 ± 0.011 ns/op ``` I see this as 20ns -> 3ns, an 85% speed up. We never hard the ordinals branch before so I'm expecting the same performance there - about 20ns per op. This also speeds up grouping by a pair of byte valued fields: ``` two_bytes_refs 83.112 ± 42.348 -> 46.521 ± 0.386 ns/op two_ordinals 83.531 ± 23.473 -> 8.617 ± 0.105 ns/op ``` The speed up is much better when the fields are ordinals because hashing bytes is comparatively slow. I believe the ordinals case is quite common. I've run into it in quite a few profiles.	2024-10-11 21:13:11 +02:00

1 2 3 4 5 ...

393 Commits