Commit Graph

1718 Commits

Author SHA1 Message Date
Umar Butler 8d8b854824
gh-128016: Improved invalid escape sequence warning message (#128020) 2025-01-15 18:00:54 +01:00
Donghee Na ae23a012e6
gh-128137: Update PyASCIIObject to handle interned field with the atomic operation (gh-128196) 2025-01-05 18:17:06 +09:00
Alexander Shadchin 46cb6340d7
gh-127903: Fix a crash on debug builds when calling `Objects/unicodeobject::_copy_characters`` (#127876) 2025-01-03 18:47:58 +00:00
Sam Gross 8eebe4e6d0
gh-128212: Fix race in `_PyUnicode_CheckConsistency` (GH-128367)
There was a data race on the utf8 field between `PyUnicode_SET_UTF8` and
`_PyUnicode_CheckConsistency`. Use the `_PyUnicode_UTF8()` accessor,
which uses an atomic load internally, to avoid the data race.
2025-01-02 14:02:54 -05:00
Kumar Aditya 3c168f7f79
gh-128013: fix data race in `PyUnicode_AsUTF8AndSize` on free-threading (#128021) 2024-12-19 17:08:32 +05:30
Victor Stinner f802c8bf87
gh-128013: Convert unicodeobject.c macros to functions (#128061)
Convert unicodeobject.c macros to static inline functions.

* Add _PyUnicode_SET_UTF8() and _PyUnicode_SET_UTF8_LENGTH() macros.
* Add PyUnicode_HASH() and PyUnicode_SET_HASH() macros.
* Remove unused _PyUnicode_KIND() and _PyUnicode_GET_LENGTH() macros.
2024-12-18 16:34:31 +01:00
Inada Naoki 5dd775bed0
gh-126024: unicodeobject: optimize find_first_nonascii (GH-127790)
Remove 1 branch.
2024-12-13 17:21:46 +01:00
Bénédikt Tran 36c6178d37
gh-126024: fix UBSan failure in `unicodeobject.c:find_first_nonascii` (GH-127566) 2024-12-06 09:31:30 -05:00
Victor Stinner bf21e2160d
Fix Unicode encode_wstr_utf8() (#127420)
Raise RuntimeError instead of RuntimeWarning.
2024-12-02 11:14:47 +01:00
Inada Naoki 7043bbd1ca
gh-127417: fix UTF-8 decoder optimization on AIX (#127433) 2024-11-30 21:52:37 +09:00
Inada Naoki 322b486010
gh-126024: optimize UTF-8 decoder for short non-ASCII string (#126025) 2024-11-29 19:48:02 +09:00
Pablo Galindo Salgado 30aeb00d36
gh-126076: Account for relocated objects in tracemalloc (#126077) 2024-11-19 10:35:17 +00:00
Eric Snow 6f26d496d3
gh-125286: Share the Main Refchain With Legacy Interpreters (gh-125709)
They used to be shared, before 3.12.  Returning to sharing them resolves a failure on Py_TRACE_REFS builds.

Co-authored-by: Petr Viktorin <encukou@gmail.com>
2024-10-23 10:10:06 -06:00
Victor Stinner 1639d934b9
gh-125196: Add a free list to PyUnicodeWriter (#125227) 2024-10-10 12:11:06 +02:00
Victor Stinner 1b2a5485f9
gh-125196: PyUnicodeWriter_Discard(NULL) does nothing (#125222) 2024-10-09 23:32:02 +00:00
Victor Stinner ee3167b978
gh-125196: Add fast-path for int in PyUnicodeWriter_WriteStr() (#125214)
PyUnicodeWriter_WriteStr() and PyUnicodeWriter_WriteRepr() now call
directly _PyLong_FormatWriter() if the argument is an int.
2024-10-10 00:01:02 +02:00
Eric Snow f2cb399470
gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-124865)
Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.

This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS
failures identified by gh-124785.
2024-10-09 11:32:16 -06:00
Victor Stinner e0c87c64b1
gh-124502: Remove _PyUnicode_EQ() function (#125114)
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
2024-10-09 10:15:17 +02:00
Victor Stinner a7f0727ca5
gh-124502: Add PyUnicode_Equal() function (#124504) 2024-10-07 21:24:53 +00:00
T. Wouters 7bdfabe2d1
gh-124785: Revert "gh-116510: Fix crash due to shared immortal interned strings (gh-124646)" (gh-124807)
Revert "gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)"

This reverts commit 98b2ed7e23.
2024-09-30 16:41:46 -07:00
Neil Schemenauer 98b2ed7e23
gh-116510: Fix crash due to shared immortal interned strings. (gh-124646) 2024-09-26 19:16:51 -07:00
Petr Viktorin da5855e99a
gh-112301: Use literal format strings in unicode_fromformat_arg (GH-124203) 2024-09-25 19:46:01 +02:00
Victor Stinner ef9d54703f
gh-107954, PEP 741: Add PyInitConfig C API (#123502)
Add Doc/c-api/config.rst documentation.
2024-09-03 12:33:49 +00:00
Victor Stinner d8e69b2c1b
gh-122854: Add Py_HashBuffer() function (#122855) 2024-08-30 15:42:27 +00:00
Serhiy Storchaka 1a0b828994
gh-122561: Clean up and microoptimize str.translate and charmap codec (GH-122932)
* Replace PyLong_AS_LONG() with PyLong_AsLong().
* Call PyLong_AsLong() only once per the replacement code.
* Use PyMapping_GetOptionalItem() instead of PyObject_GetItem().
2024-08-28 12:11:13 +03:00
Eddie Elizondo 3203a74129
gh-113190: Reenable non-debug interned string cleanup (GH-113601) 2024-08-15 11:55:09 +00:00
Jelle Zijlstra 53ebb6232a
gh-122888: Fix crash on certain calls to str() (#122889)
Fixes #122888
2024-08-12 09:20:09 -07:00
Victor Stinner fda6bd842a
Replace PyObject_Del with PyObject_Free (#122453)
PyObject_Del() is just a alias to PyObject_Free() kept for backward
compatibility. Use directly PyObject_Free() instead.
2024-08-01 14:12:33 +02:00
Petr Viktorin bb09ba6792
gh-122291: Intern latin-1 one-byte strings at startup (GH-122303) 2024-07-27 10:27:06 +02:00
Victor Stinner bfdbeac355
gh-121849: Fix PyUnicodeWriter_WriteSubstring() crash if len=0 (#121896)
Do nothing if start=end.
2024-07-17 10:26:05 +02:00
Petr Viktorin b4aedb23ae
gh-113993: Don't immortalize in PyUnicode_InternInPlace; keep immortalizing in other API (#121364)
* Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs

* Document immortality in some functions that take `const char *`

This is PyUnicode_InternFromString;
PyDict_SetItemString, PyObject_SetAttrString;
PyObject_DelAttrString; PyUnicode_InternFromString;
and the PyModule_Add convenience functions.

Always point out a non-immortalizing alternative.

* Don't immortalize user-provided attr names in _ctypes
2024-07-16 15:36:21 +02:00
Petr Viktorin 956270d08d
gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358)
Older stable ABI extensions are allowed to make immortal objects mortal.
Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`).
2024-07-16 15:17:29 +02:00
Bénédikt Tran 6343486eb6
gh-121165: protect macro expansion of `ADJUST_INDICES` with do-while(0) (#121166) 2024-07-02 16:27:51 +05:30
Victor Stinner 12af8ec864
gh-121040: Use __attribute__((fallthrough)) (#121044)
Fix warnings when using -Wimplicit-fallthrough compiler flag.

Annotate explicitly "fall through" switch cases with a new
_Py_FALLTHROUGH macro which uses __attribute__((fallthrough)) if
available. Replace "fall through" comments with _Py_FALLTHROUGH.

Add _Py__has_attribute() macro. No longer define __has_attribute()
macro if it's not defined. Move also _Py__has_builtin() at the top
of pyport.h.

Co-Authored-By: Nikita Sobolev <mail@sobolevn.me>
2024-06-27 09:58:44 +00:00
Xie Yanbo 0153fd0940
Fix typos in comments (#120821) 2024-06-24 19:47:00 +02:00
Steve Dower e731554337
Fixes loop variables to be the same types as their limit (GH-120958) 2024-06-24 17:11:47 +01:00
Victor Stinner 2e157851e3
gh-119182: Add PyUnicodeWriter_WriteUCS4() function (#120849) 2024-06-24 17:40:39 +02:00
Serhiy Storchaka 6eb23b1311
gh-70278: Fix PyUnicode_FromFormat() with precision for %s and %V (GH-120365)
PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
2024-06-24 18:07:07 +03:00
Victor Stinner e213475495
gh-119182: Add checks to PyUnicodeWriter APIs (#120870) 2024-06-22 17:25:55 +02:00
Victor Stinner 879d1f28bb
gh-119182: Use PyUnicodeWriter_WriteWideChar() (#120851)
Use PyUnicodeWriter_WriteWideChar() in PyUnicode_FromFormat()
2024-06-22 08:58:22 +02:00
Victor Stinner 4123226bbd
gh-119182: Add PyUnicodeWriter_DecodeUTF8Stateful() (#120639)
Add PyUnicodeWriter_WriteWideChar() and
PyUnicodeWriter_DecodeUTF8Stateful() functions.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-06-21 19:33:15 +02:00
Petr Viktorin 6f1d448bc1
gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520)
* Add an InternalDocs file describing how interning should work and how to use it.

* Add internal functions to *explicitly* request what kind of interning is done:
  - `_PyUnicode_InternMortal`
  - `_PyUnicode_InternImmortal`
  - `_PyUnicode_InternStatic`

* Switch uses of `PyUnicode_InternInPlace` to those.

* Disallow using `_Py_SetImmortal` on strings directly.
  You should use `_PyUnicode_InternImmortal` instead:
  - Strings should be interned before immortalization, otherwise you're possibly
    interning a immortalizing copy.
  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
    backports, as they are now part of public API and version-specific ABI.

* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
  - `_Py_ID`
  - `_Py_STR` (including the empty string)
  - one-character latin-1 singletons

  Now, when you intern a singleton, that exact singleton will be interned.

* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

* Intern `_Py_STR` singletons at startup.

* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.

* Beef up the tests. Cover internal details (marked with `@cpython_only`).

* Add lots of assertions

Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
2024-06-21 17:19:31 +02:00
Victor Stinner 5150795b1c
gh-119182: Optimize PyUnicode_FromFormat() (#120796)
Use strchr() and ucs1lib_find_max_char() to optimize the code path
formatting sub-strings between '%' formats.
2024-06-20 19:06:16 +00:00
Victor Stinner 5c4235cd8c
gh-119182: Add PyUnicodeWriter C API (#119184) 2024-06-17 17:10:52 +02:00
Eric Snow 105f22ea46
gh-117398: Use Per-Interpreter State for the _datetime Static Types (gh-119929)
We make use of the same mechanism that we use for the static builtin types.  This required a few tweaks.

The relevant code could use some cleanup but I opted to avoid the significant churn in this change.  I'll tackle that separately.

This change is the final piece needed to make _datetime support multiple interpreters.  I've updated the module slot accordingly.
2024-06-03 17:09:18 -06:00
Victor Stinner 3ea9b92086
gh-119396: Optimize unicode_decode_utf8_writer() (#119957)
Optimize unicode_decode_utf8_writer()

Take the ascii_decode() fast-path even if dest is not aligned on
size_t bytes.
2024-06-03 08:45:20 +02:00
Donghee Na 0594a27e5f
gh-117657: Fix data races report by TSAN unicode-hash (gh-119907) 2024-06-03 12:22:41 +09:00
Victor Stinner 0518edc170
gh-119396: Optimize unicode_repr() (#119617)
Use stringlib to specialize unicode_repr() for each string kind
(UCS1, UCS2, UCS4).

Benchmark:

+-------------------------------------+---------+----------------------+
| Benchmark                           | ref     | change2              |
+=====================================+=========+======================+
| repr('abc')                         | 100 ns  | 103 ns: 1.02x slower |
+-------------------------------------+---------+----------------------+
| repr('a' * 100)                     | 369 ns  | 369 ns: 1.00x slower |
+-------------------------------------+---------+----------------------+
| repr(('a' + squote) * 100)          | 1.21 us | 946 ns: 1.27x faster |
+-------------------------------------+---------+----------------------+
| repr(('a' + nl) * 100)              | 1.23 us | 907 ns: 1.36x faster |
+-------------------------------------+---------+----------------------+
| repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster |
+-------------------------------------+---------+----------------------+
| Geometric mean                      | (ref)   | 1.16x faster         |
+-------------------------------------+---------+----------------------+
2024-05-28 18:05:20 +02:00
Serhiy Storchaka b313cc68d5
gh-117557: Improve error messages when a string, bytes or bytearray of length 1 are expected (GH-117631) 2024-05-28 12:01:37 +03:00
Serhiy Storchaka 08e65430aa
gh-111999: Fix the signature of str.format_map() (#119540) 2024-05-25 06:21:11 -07:00