ESQL: Document warnings behavior in CsvTests (#125441)

The `CsvTests` has a slight difference regarding warnings from real
Elasticsearch indices and this is worth documenting. I've also added an
explanation to `SingleValueMatchQuery` that explains *exactly* when it
makes a warning because it's not *exactly* the same as when the compute
engine would make a warning. The resulting documents are the same - but
the warnings are not.
This commit is contained in:
Nik Everett 2025-03-24 11:02:23 -04:00 committed by GitHub
parent 1078bd0c41
commit f83ca0c6b7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 23 additions and 1 deletions

View File

@ -37,7 +37,17 @@ import java.io.IOException;
import java.util.Objects;
/**
* Finds all fields with a single-value. If a field has a multi-value, it emits a {@link Warnings}.
* Finds all fields with a single-value. If a field has a multi-value, it emits
* a {@link Warnings warning}.
* <p>
* Warnings are only emitted if the {@link TwoPhaseIterator#matches}. Meaning that,
* if the other query skips the doc either because the index doesn't match or because it's
* {@link TwoPhaseIterator#matches} doesn't match, then we won't log warnings. So it's
* most safe to say that this will emit a warning if the document would have
* matched but for having a multivalued field. If the document doesn't match but
* "almost" matches in some fairly lucene-specific ways then it *might* emit
* a warning.
* </p>
*/
public final class SingleValueMatchQuery extends Query {

View File

@ -30,6 +30,7 @@ import org.elasticsearch.compute.operator.Driver;
import org.elasticsearch.compute.operator.DriverRunner;
import org.elasticsearch.compute.operator.exchange.ExchangeSinkHandler;
import org.elasticsearch.compute.operator.exchange.ExchangeSourceHandler;
import org.elasticsearch.compute.querydsl.query.SingleValueMatchQuery;
import org.elasticsearch.core.Releasables;
import org.elasticsearch.core.Tuple;
import org.elasticsearch.index.IndexMode;
@ -153,6 +154,17 @@ import static org.hamcrest.Matchers.notNullValue;
* its creating its own Source physical operator, aggregation operator (just a tiny bit of it) and field extract operator.
* <p>
* To log the results logResults() should return "true".
* <p>
* This test never pushes to Lucene because there isn't a Lucene index to push to. It always runs everything in
* the compute engine. This yields the same results modulo a few things:
* <ul>
* <li>Warnings for multivalued fields: See {@link SingleValueMatchQuery} for an in depth discussion, but the
* short version is this class will always emit warnings on multivalued fields but tests that run against
* a real index are only guaranteed to emit a warning if the document would match all filters <strong>except</strong>
* it has a multivalue field.</li>
* <li>Sorting: This class emits values in the order they appear in the {@code .csv} files that power it. A real
* index emits documents a fair random order. Multi-shard and multi-node tests doubly so.</li>
* </ul>
*/
// @TestLogging(value = "org.elasticsearch.xpack.esql:TRACE,org.elasticsearch.compute:TRACE", reason = "debug")
public class CsvTests extends ESTestCase {