ESQL: Document warnings behavior in CsvTests (#125441)

The `CsvTests` has a slight difference regarding warnings from real Elasticsearch indices and this is worth documenting. I've also added an explanation to `SingleValueMatchQuery` that explains *exactly* when it makes a warning because it's not *exactly* the same as when the compute engine would make a warning. The resulting documents are the same - but the warnings are not.
2025-03-24 11:02:23 -04:00 · 2025-03-24 11:02:23 -04:00 · f83ca0c6b7
parent 1078bd0c41
commit f83ca0c6b7
2 changed files with 23 additions and 1 deletions
--- a/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/querydsl/query/SingleValueMatchQuery.java
+++ b/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/querydsl/query/SingleValueMatchQuery.java
@ -37,7 +37,17 @@ import java.io.IOException;
 import java.util.Objects;

 /**
- * Finds all fields with a single-value. If a field has a multi-value, it emits a {@link Warnings}.
+ * Finds all fields with a single-value. If a field has a multi-value, it emits
+ * a {@link Warnings warning}.
+ * <p>
+ *     Warnings are only emitted if the {@link TwoPhaseIterator#matches}. Meaning that,
+ *     if the other query skips the doc either because the index doesn't match or because it's
+ *     {@link TwoPhaseIterator#matches} doesn't match, then we won't log warnings. So it's
+ *     most safe to say that this will emit a warning if the document would have
+ *     matched but for having a multivalued field. If the document doesn't match but
+ *     "almost" matches in some fairly lucene-specific ways then it *might* emit
+ *     a warning.
+ * </p>
 */
 public final class SingleValueMatchQuery extends Query {

--- a/x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/CsvTests.java
+++ b/x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/CsvTests.java
@ -30,6 +30,7 @@ import org.elasticsearch.compute.operator.Driver;
 import org.elasticsearch.compute.operator.DriverRunner;
 import org.elasticsearch.compute.operator.exchange.ExchangeSinkHandler;
 import org.elasticsearch.compute.operator.exchange.ExchangeSourceHandler;
+import org.elasticsearch.compute.querydsl.query.SingleValueMatchQuery;
 import org.elasticsearch.core.Releasables;
 import org.elasticsearch.core.Tuple;
 import org.elasticsearch.index.IndexMode;
@ -153,6 +154,17 @@ import static org.hamcrest.Matchers.notNullValue;
 * it’s creating its own Source physical operator, aggregation operator (just a tiny bit of it) and field extract operator.
 * <p>
 * To log the results logResults() should return "true".
+ * <p>
+ * This test never pushes to Lucene because there isn't a Lucene index to push to. It always runs everything in
+ * the compute engine. This yields the same results modulo a few things:
+ * <ul>
+ *     <li>Warnings for multivalued fields: See {@link SingleValueMatchQuery} for an in depth discussion, but the
+ *         short version is this class will always emit warnings on multivalued fields but tests that run against
+ *         a real index are only guaranteed to emit a warning if the document would match all filters <strong>except</strong>
+ *         it has a multivalue field.</li>
+ *     <li>Sorting: This class emits values in the order they appear in the {@code .csv} files that power it. A real
+ *         index emits documents a fair random order. Multi-shard and multi-node tests doubly so.</li>
+ * </ul>
 */
 // @TestLogging(value = "org.elasticsearch.xpack.esql:TRACE,org.elasticsearch.compute:TRACE", reason = "debug")
 public class CsvTests extends ESTestCase {