From 397c9c59c7e25b81f20a808535a8cbe73486b61e Mon Sep 17 00:00:00 2001 From: Liam Thompson <32779855+leemthompo@users.noreply.github.com> Date: Fri, 21 Mar 2025 17:43:44 +0100 Subject: [PATCH] Clarify regex character range case insensitivity limitations (#125413) * Update regexp-syntax.md 9.x equivalent of https://github.com/elastic/elasticsearch/pull/125412 * use md syntax --- docs/reference/query-languages/query-dsl/regexp-syntax.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/reference/query-languages/query-dsl/regexp-syntax.md b/docs/reference/query-languages/query-dsl/regexp-syntax.md index 5335c8cc0066..d52869071324 100644 --- a/docs/reference/query-languages/query-dsl/regexp-syntax.md +++ b/docs/reference/query-languages/query-dsl/regexp-syntax.md @@ -138,6 +138,13 @@ A `^` before a character in the brackets negates the character or range. For exa [^abc\-] # matches any character except 'a', 'b', 'c', or '-' ``` +:::{note} +Character range classes such as `[a-c]` do not behave as expected when using `case_insensitive: true` — they remain case sensitive. For example, `[a-c]+` with `case_insensitive: true` will match strings containing only the characters 'a', 'b', and 'c', but not 'A', 'B', or 'C'. Use `[a-zA-Z]` to match both uppercase and lowercase characters. + +This is due to a known limitation in Lucene's regular expression engine. +See [Lucene issue #14378](https://github.com/apache/lucene/issues/14378) for details. +::: + ## Optional operators [regexp-optional-operators]