Commit Graph

35 Commits

Author SHA1 Message Date
Serhiy Storchaka ac56f8cc8d
gh-133306: Support \z as a synonym for \Z in regular expressions (GH-133314)
\Z was an error inherited from PCRE 0.95. It was fixed in PCRE 2.0.
In other engines, \Z means not “anchor at string end”, but
“anchor before optional newline at string end”.

\z means “anchor at string end” in most RE engines.
2025-05-03 07:54:33 +00:00
Serhiy Storchaka 819830f34a
gh-126505: Fix bugs in compiling case-insensitive character classes (GH-126557)
* upper-case non-BMP character was ignored
* the ASCII flag was ignored when matching a character range whose
  upper bound is beyond the BMP region
2024-11-11 18:27:26 +02:00
Serhiy Storchaka f9637b4ba3
Remove dead code in the RE parser (GH-122796) 2024-08-07 19:44:18 +00:00
Jelle Zijlstra b72c748d7f
Fix syntax in generate_re_casefix.py (#122699)
This was broken in gh-97963.
2024-08-05 23:16:29 -07:00
Serhiy Storchaka 8bc76ae45f
gh-111259: Optimize complementary character sets in RE (GH-120742)
Patterns like "[\s\S]" or "\s|\S" which match any character are now compiled
to the same effective code as a dot with the DOTALL modifier ("(?s:.)").
2024-06-20 07:19:32 +00:00
Victor Stinner 6ae254aaa0
gh-120417: Add #noqa to used imports in the stdlib (#120421)
Tools such as ruff can ignore "imported but unused" warnings if a
line ends with "# noqa: F401". It avoids the temptation to remove
an import which is used effectively.
2024-06-13 16:14:50 +02:00
achhina a01022af23
GH-83162: Rename re.error for better clarity. (#101677)
Renamed re.error for clarity, and kept re.error for backward compatibility.
Updated idlelib files at TJR's request.
---------

Co-authored-by: Matthias Bussonnier <mbussonnier@ucmerced.edu>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2023-12-11 15:45:08 -05:00
Serhiy Storchaka e2b3d831fd
gh-109747: Improve errors for unsupported look-behind patterns (GH-109859)
Now re.error is raised instead of OverflowError or RuntimeError for
too large width of look-behind pattern.

The limit is increased to 2**32-1 (was 2**31-1).
2023-10-14 09:13:02 +03:00
Serhiy Storchaka 882cb79afa
gh-56166: Deprecate passing confusing positional arguments in re functions (#107778)
Deprecate passing optional arguments maxsplit, count and flags in
module-level functions re.split(), re.sub() and re.subn() as positional.
They should only be passed by keyword.
2023-08-16 13:35:35 -07:00
SKO abd9cc52d9
gh-100061: Proper fix of the bug in the matching of possessive quantifiers (GH-102612)
Restore the global Input Stream pointer after trying to match a sub-pattern.

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
2023-08-16 10:43:45 +03:00
Serhiy Storchaka 7b6e34e5ba
gh-106052: Fix bug in the matching of possessive quantifiers (gh-106515)
It did not work in the case of a subpattern containing backtracking.

Temporary implement possessive quantifiers as equivalent greedy qualifiers
in atomic groups.
2023-08-09 08:47:57 +03:00
Serhiy Storchaka ed64204716
gh-106566: Optimize (?!) in regular expressions (GH-106567) 2023-08-07 18:09:56 +03:00
Serhiy Storchaka 74ec02e949
gh-106510: Fix DEBUG output for atomic group (GH-106511) 2023-07-08 14:31:25 +03:00
Nikita Sobolev 67f69dba0a
gh-105687: Remove deprecated objects from `re` module (#105688) 2023-06-14 12:26:20 +02:00
Serhiy Storchaka 75a6fadf36
gh-91524: Speed up the regular expression substitution (#91525)
Functions re.sub() and re.subn() and corresponding re.Pattern methods
are now 2-3 times faster for replacement strings containing group references.

Closes #91524

Primarily authored by serhiy-storchaka Serhiy Storchaka
Minor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>
2022-10-23 15:57:30 -07:00
Serhiy Storchaka c11b667a1d
gh-96346: Use double caching for re._compile() (#96347) 2022-10-07 12:21:42 -07:00
Gregory P. Smith 4beee0c7b0
gh-91404: Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or allocation failure (GH-32283) (#93882)
Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)"

This reverts commit 6e3eee5c11.

Manual fixups to increase the MAGIC number and to handle conflicts with
a couple of changes that landed after that.

Thanks for reviews by Ma Lin and Serhiy Storchaka.
2022-06-17 01:19:44 -07:00
Miro Hrončok 16a7e4a0b7
gh-92728: Restore re.template, but deprecate it (GH-93161)
Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)"

This reverts commit b09184bf05.
2022-05-25 09:05:35 +03:00
Serhiy Storchaka a84a56d80f
gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)
Only sequence of ASCII digits is now accepted as a numerical reference.
The group name in bytes patterns and replacement strings can now only
contain ASCII letters and digits and underscore.
2022-05-08 19:19:29 +03:00
Serhiy Storchaka 19dca04121
gh-91760: Deprecate group names and numbers which will be invalid in future (GH-91794)
Only sequence of ASCII digits will be accepted as a numerical reference.
The group name in bytes patterns and replacement strings could only
contain ASCII letters and digits and underscore.
2022-04-30 13:13:46 +03:00
Serhiy Storchaka 6d0d547033
gh-92049: Forbid pickling constants re._constants.SUCCESS etc (GH-92070)
Previously, pickling did not fail, but the result could not be unpickled.
2022-04-30 13:03:23 +03:00
Serhiy Storchaka f703c96cf0
gh-91870: Remove unsupported SRE opcode CALL (GH-91872)
It was initially added to support atomic groups, but that
support was never fully implemented, and CALL was only left
in the compiler, but not interpreter and parser.

ATOMIC_GROUP is now used to support atomic groups.
2022-04-26 21:07:25 +03:00
Serhiy Storchaka 28890427c5
RE: Pre-split the list of opcode names (GH-91859)
1. It makes them interned.
2. It allows to add comments to individual opcodes.
2022-04-23 18:49:23 +03:00
Serhiy Storchaka 130a8c386b
gh-91308: Simplify parsing inline flag "x" (verbose) (GH-91855) 2022-04-23 12:50:42 +03:00
Serhiy Storchaka f912cc0e41
gh-91575: Add a script for generating data for case-insensitive matching in re (GH-91660)
Also test that all extra cases are in BMP.
2022-04-22 21:37:46 +03:00
Serhiy Storchaka 48ec61a89a
gh-91700: Validate the group number in conditional expression in RE (GH-91702)
In expression (?(group)...) an appropriate re.error is now
raised if the group number refers to not defined group.

Previously it raised RuntimeError: invalid SRE code.
2022-04-22 19:53:10 +03:00
Serhiy Storchaka 6ccfa31421
gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665)
re.error is now raised instead of TypeError.
2022-04-22 18:35:28 +03:00
Serhiy Storchaka 1c2fcebf3c
gh-91575: Update case-insensitive matching in re to the latest Unicode version (GH-91580) 2022-04-18 12:26:30 +03:00
Serhiy Storchaka 474fdbe9e4
bpo-47152: Automatically regenerate sre_constants.h (GH-91439)
* Move the code for generating Modules/_sre/sre_constants.h from
  Lib/re/_constants.py into a separate script
  Tools/scripts/generate_sre_constants.py.
* Add target `regen-sre` in the makefile.
* Make target `regen-all` depending on `regen-sre`.
2022-04-12 18:34:06 +03:00
Serhiy Storchaka 50872dbadc
bpo-47227: Suppress expression chaining for more RE parsing errors (GH-32333) 2022-04-06 19:54:44 +03:00
Serhiy Storchaka b09184bf05
bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)
They were undocumented and never working.
2022-04-06 19:53:50 +03:00
Serhiy Storchaka ff2cf1d7d5
bpo-47152: Remove unused import in re (GH-32298) 2022-04-04 12:00:53 +03:00
Serhiy Storchaka 1578f06c1c
bpo-47152: Move sources of the _sre module into a subdirectory (GH-32290) 2022-04-04 10:53:26 +03:00
Ma Lin 6e3eee5c11
bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283) 2022-04-03 19:16:20 +03:00
Serhiy Storchaka 1be3260a90
bpo-47152: Convert the re module into a package (GH-32177)
The sre_* modules are now deprecated.
2022-04-02 11:35:13 +03:00