cpython

Commit Graph

Author	SHA1	Message	Date
Serhiy Storchaka	6279eb8c07	[3.13] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit `9f69a58623`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-05-20 14:46:57 +02:00
Donghee Na	7ffef8d07b	[3.13] gh-132070: Use _PyObject_IsUniquelyReferenced in unicodeobject (gh-133039) (gh-133126) * gh-132070: Use _PyObject_IsUniquelyReferenced in unicodeobject (gh-133039) --------- (cherry picked from commit `75cbb8d89e`) Co-authored-by: Donghee Na <donghee.na@python.org> Co-authored-by: Kumar Aditya <kumaraditya@python.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> * Add _PyObject_IsUniquelyReferenced --------- Co-authored-by: Kumar Aditya <kumaraditya@python.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-04-29 14:26:44 +09:00
Stan Ulbrych	320316ef7e	[3.13] gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130127) We had the definition of what makes a character "printable" documented in three places, giving two different definitions. The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that. With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics. Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place. Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs. Also add a thorough test that the implementation agrees with this definition. Author: Greg Price <gnprice@gmail.com> Co-authored-by: Greg Price <gnprice@gmail.com> (cherry picked from commit `3402e133ef`)	2025-02-17 15:02:39 +01:00
Miss Islington (bot)	b875917e21	[3.13] gh-127903: Fix a crash on debug builds when calling `Objects/unicodeobject::_copy_characters` (GH-127876) (#128458 ) gh-127903: Fix a crash on debug builds when calling `Objects/unicodeobject::_copy_characters`` (GH-127876) (cherry picked from commit `46cb6340d7`) Co-authored-by: Alexander Shadchin <shadchin@yandex-team.com>	2025-01-03 21:20:30 +02:00
Kumar Aditya	fa6c48e4b3	[3.13] gh-128013: fix data race in PyUnicode_AsUTF8AndSize on free-threading (#128021 ) (#128417 )	2025-01-02 22:10:17 +05:30
Pablo Galindo Salgado	eb692d945e	[3.13] gh-126076: Account for relocated objects in tracemalloc (GH-126077) (#127823 ) (cherry picked from commit `30aeb00d36`)	2024-12-11 14:15:37 +01:00
Bénédikt Tran	943e57e1ce	[3.13] Fix Unicode encode_wstr_utf8() (#127420 ) (#127505 ) Fix Unicode encode_wstr_utf8() (#127420) Raise RuntimeError instead of RuntimeWarning. Co-authored-by: Victor Stinner <vstinner@python.org>	2024-12-02 13:24:03 +01:00
Miss Islington (bot)	02cd3ce0f2	[3.13] gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-124865) (gh-125709) (GH-125204) * gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-124865) Fix a crash caused by immortal interned strings being shared between sub-interpreters that use basic single-phase init. In that case, the string can be used by an interpreter that outlives the interpreter that created and interned it. For interpreters that share obmalloc state, also share the interned dict with the main interpreter. This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS failures identified by gh-124785. (cherry picked from commit `f2cb399470`) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com> * [3.13] gh-125286: Share the Main Refchain With Legacy Interpreters (gh-125709) They used to be shared, before 3.12. Returning to sharing them resolves a failure on Py_TRACE_REFS builds. --------- Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>	2024-11-12 13:45:12 +01:00
Miss Islington (bot)	b843974ab4	[3.13] gh-124785: Revert "gh-116510: Fix crash due to shared immortal interned strings (gh-124646)" (gh-124807) (#124812 ) gh-124785: Revert "gh-116510: Fix crash due to shared immortal interned strings (gh-124646)" (gh-124807) Revert "gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)" This reverts commit `98b2ed7e23`. (cherry picked from commit `7bdfabe2d1`) Co-authored-by: T. Wouters <thomas@python.org>	2024-09-30 18:38:26 -07:00
Miss Islington (bot)	dc09a0c67f	[3.13] gh-116510: Fix crash due to shared immortal interned strings. (gh-124646) (#124648 ) gh-116510: Fix crash due to shared immortal interned strings. (gh-124646) (cherry picked from commit `98b2ed7e23`) Co-authored-by: Neil Schemenauer <nas-github@arctrix.com>	2024-09-27 11:15:25 -07:00
Miss Islington (bot)	55aede7342	[3.13] gh-122888: Fix crash on certain calls to str() (GH-122889) (#122947 ) Fixes GH-122888 (cherry picked from commit `53ebb6232a`) Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>	2024-08-12 16:53:05 +00:00
Miss Islington (bot)	40925103fc	[3.13] gh-122291: Intern latin-1 one-byte strings at startup (GH-122303) (GH-122347) (cherry picked from commit `bb09ba6792`) Co-authored-by: Petr Viktorin <encukou@gmail.com>	2024-07-27 08:51:02 +00:00
Petr Viktorin	4395d68c70	[3.13] gh-113993: Don't immortalize in PyUnicode_InternInPlace; keep immortalizing in other API (GH-121364) (GH-121854) * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs * Document immortality in some functions that take `const char ` This is PyUnicode_InternFromString; PyDict_SetItemString, PyObject_SetAttrString; PyObject_DelAttrString; PyUnicode_InternFromString; and the PyModule_Add convenience functions. Always point out a non-immortalizing alternative. Don't immortalize user-provided attr names in _ctypes (cherry picked from commit `b4aedb23ae`)	2024-07-17 14:51:42 +02:00
Miss Islington (bot)	281ffb60cc	[3.13] gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358) (GH-121851) gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358) Older stable ABI extensions are allowed to make immortal objects mortal. Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`). (cherry picked from commit `956270d08d`) Co-authored-by: Petr Viktorin <encukou@gmail.com>	2024-07-16 13:42:49 +00:00
Petr Viktorin	9769b7ae06	[3.13] gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520) (GH-120945) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to explicitly request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>	2024-06-24 20:24:19 +02:00
Miss Islington (bot)	e5fb3a2385	[3.13] gh-117398: Use Per-Interpreter State for the _datetime Static Types (gh-120009) We make use of the same mechanism that we use for the static builtin types. This required a few tweaks. This change is the final piece needed to make _datetime support multiple interpreters. I've updated the module slot accordingly. (cherry picked from commit `105f22ea46`, AKA gh-119929) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>	2024-06-03 23:37:28 +00:00
Miss Islington (bot)	e5dfcea3e3	[3.13] gh-117657: Fix data races report by TSAN unicode-hash (gh-119907) (gh-119963) gh-117657: Fix data races report by TSAN unicode-hash (gh-119907) (cherry picked from commit `0594a27e5f`) Co-authored-by: Donghee Na <donghee.na@python.org>	2024-06-03 03:45:44 +00:00
Miss Islington (bot)	f49749cf8f	[3.13] gh-111999: Fix the signature of str.format_map() (GH-119540) (#119543 ) (cherry picked from commit `08e65430aa`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2024-05-25 13:46:41 +00:00
Miss Islington (bot)	08416065a7	[3.13] gh-119247: Add macros to use PySequence_Fast safely in free-threaded build (GH-119315) (#119419 ) Add `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST` and `Py_END_CRITICAL_SECTION_SEQUENCE_FAST` macros and update `str.join` to use them. Also add a regression test that would crash reliably without this patch. (cherry picked from commit `baf347d916`) Co-authored-by: Josh {*()} Rosenberg <26495692+MojoVampire@users.noreply.github.com>	2024-05-22 19:24:02 +00:00
Brett Simmers	c2627d6eea	gh-116322: Add Py_mod_gil module slot (#116882 ) This PR adds the ability to enable the GIL if it was disabled at interpreter startup, and modifies the multi-phase module initialization path to enable the GIL when loading a module, unless that module's spec includes a slot indicating it can run safely without the GIL. PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148. A warning will be issued up to once per interpreter for the first GIL-using module that is loaded. If `-v` is given, a shorter message will be printed to stderr every time a GIL-using module is loaded (including the first one that issues a warning).	2024-05-03 11:30:55 -04:00
Brett Simmers	f8290df63f	gh-116738: Make `_codecs` module thread-safe (#117530 ) The module itself is a thin wrapper around calls to functions in `Python/codecs.c`, so that's where the meaningful changes happened: - Move codecs-related state that lives on `PyInterpreterState` to a struct declared in `pycore_codecs.h`. - In free-threaded builds, add a mutex to `codecs_state` to synchronize operations on `search_path`. Because `search_path_mutex` is used as a normal mutex and not a critical section, we must be extremely careful with operations called while holding it. - The codec registry is explicitly initialized as part of `_PyUnicode_InitEncodings` to simplify thread-safety.	2024-05-02 18:25:36 -04:00
Erlend E. Aasland	044dc496e0	gh-117709: Add vectorcall support for str() with positional-only arguments (#117746 ) Fall back to tp_call() for cases when arguments are passed by name. Co-authored-by: Donghee Na <donghee.na@python.org> Co-authored-by: Victor Stinner <vstinner@python.org>	2024-04-11 13:55:37 +00:00
Serhiy Storchaka	24a2bd0481	gh-117642: Fix PEP 737 implementation (GH-117643) * Fix implementation of %#T and %#N (they were implemented as %T# and %N#). * Restore tests removed in gh-116417.	2024-04-08 16:27:25 +00:00
Sam Gross	1a6594f661	gh-117439: Make refleak checking thread-safe without the GIL (#117469 ) This keeps track of the per-thread total reference count operations in PyThreadState in the free-threaded builds. The count is merged into the interpreter's total when the thread exits.	2024-04-08 12:11:36 -04:00
Erlend E. Aasland	7ecd55d604	gh-117431: Adapt str.find and friends to Argument Clinic (#117468 ) This change gives a significant speedup, as the METH_FASTCALL calling convention is now used. The following methods are adapted: - str.count - str.find - str.index - str.rfind - str.rindex	2024-04-03 17:59:18 +02:00
Erlend E. Aasland	1dc1521042	gh-117431: Fix str.endswith docstring (#117499 ) The first parameter is named 'suffix', not 'prefix'. Regression introduced by commit `444156ed`	2024-04-03 12:33:20 +02:00
Erlend E. Aasland	444156ede4	gh-117431: Adapt str.startswith and str.endswith to Argument Clinic (#117466 ) This change gives a significant speedup, as the METH_FASTCALL calling convention is now used.	2024-04-03 09:11:39 +02:00
Sam Gross	60e105c1c1	gh-113964: Don't prevent new threads until all non-daemon threads exit (#116677 ) Starting in Python 3.12, we prevented calling fork() and starting new threads during interpreter finalization (shutdown). This has led to a number of regressions and flaky tests. We should not prevent starting new threads (or `fork()`) until all non-daemon threads exit and finalization starts in earnest. This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`, which is set immediately before terminating non-daemon threads.	2024-03-19 14:40:20 -04:00
Victor Stinner	7bbb9b57e6	gh-111696, PEP 737: Add %T and %N to PyUnicode_FromFormat() (#116839 )	2024-03-14 22:23:00 +00:00
Sam Gross	ef3ceab09d	gh-112066: Use `PyDict_SetDefaultRef` in place of `PyDict_SetDefault`. (#112211 ) This changes a number of internal usages of `PyDict_SetDefault` to use `PyDict_SetDefaultRef`. Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>	2024-02-07 13:43:18 -05:00
Erlend E. Aasland	53d921ed96	gh-114569: Use PyMem_* APIs for non-PyObjects in unicodeobject.c (#114690 )	2024-01-29 21:48:49 +01:00
Donghee Na	8f5b998706	gh-111971: Make _PyUnicode_FromId thread-safe in --disable-gil (gh-113489)	2023-12-26 16:48:33 +00:00
Erlend E. Aasland	526d0a9b6e	gh-110383: Improve accuracy of str.split() and str.rsplit() docstrings (#113355 ) Clarify split direction in the docstring body, instead of in the 'maxsplit' param docstring.	2023-12-21 15:22:39 +01:00
Sam Gross	cf6110ba13	gh-111924: Use PyMutex for Runtime-global Locks. (gh-112207) This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize. This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.	2023-12-07 12:33:40 -07:00
Kirill Podoprigora	0785c68559	gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249 )	2023-11-30 11:12:49 +01:00
Serhiy Storchaka	1d75ef6b61	gh-111999: Add signatures and improve docstrings for builtins (GH-112000)	2023-11-13 09:13:49 +02:00
Serhiy Storchaka	771bd3c94a	Add private _PyUnicode_AsUTF8NoNUL() function (GH-111957) Like PyUnicode_AsUTF8(), but check for embedded null characters.	2023-11-10 21:31:36 +02:00
Victor Stinner	11e83488c5	gh-111089: Revert PyUnicode_AsUTF8() changes (#111833 ) * Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)" This reverts commit `d9b606b3d0`. * Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)" This reverts commit `cde1071b2a`. * Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)" This reverts commit `d731579bfb`. * Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)" This reverts commit `d8f32be5b6`. * Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)" This reverts commit `37e4e20eaa`.	2023-11-07 22:36:13 +00:00
Sam Gross	6dfb8fe023	gh-110481: Implement biased reference counting (gh-110764)	2023-10-30 16:06:09 +00:00
Victor Stinner	f1e751e933	gh-111089: PyUnicode_AsUTF8AndSize() sets size on error (#111106 ) On error, PyUnicode_AsUTF8AndSize() now sets the size argument to -1, to avoid undefined value.	2023-10-20 20:03:11 +02:00
Victor Stinner	d731579bfb	gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091 ) * PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters. * Update related C API tests (test_capi.test_unicode). * type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently truncate doc containing null bytes. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2023-10-20 17:59:29 +02:00
Serhiy Storchaka	eb50cd37ea	gh-110289: C API: Add PyUnicode_EqualToUTF8() and PyUnicode_EqualToUTF8AndSize() functions (GH-110297)	2023-10-11 16:41:58 +03:00
Victor Stinner	8b626a47ba	gh-110079: Remove extern "C" { ...} in C code (#110080 )	2023-09-29 10:56:49 +02:00
Sam Gross	2aceb21ae6	gh-109693: Remove pycore_atomic_funcs.h (#109694 ) _PyUnicode_FromId() now uses pyatomic.h functions instead.	2023-09-21 22:57:20 +02:00
Daniel Weiss	e7d5433f94	gh-108915: Removes extra backslashes in str.split docstring (#109044 )	2023-09-07 05:33:51 +00:00
Serhiy Storchaka	2b15536fa9	gh-107913: Fix possible losses of OSError error codes (GH-107930) Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be called immediately after using the C API which sets errno or the Windows error code.	2023-08-27 00:35:06 +03:00
Victor Stinner	b32d4cad15	gh-108444: Replace _PyLong_AsInt() with PyLong_AsInt() (#108459 ) Change generated by the command: sed -i -e 's!_PyLong_AsInt!PyLong_AsInt!g' \ $(find -name ".c" -o -name ".h")	2023-08-25 01:01:30 +02:00
Victor Stinner	c494fb333b	gh-106320: Remove private _PyEval function (#108433 ) Move private _PyEval functions to the internal C API (pycore_ceval.h): * _PyEval_GetBuiltin() * _PyEval_GetBuiltinId() * _PyEval_GetSwitchInterval() * _PyEval_MakePendingCalls() * _PyEval_SetProfile() * _PyEval_SetSwitchInterval() * _PyEval_SetTrace() No longer export most of these functions.	2023-08-24 20:25:22 +02:00
Brandt Bucher	05a824f294	GH-84436: Skip refcounting for known immortals (GH-107605)	2023-08-04 16:24:50 -07:00
Eric Snow	b72947a8d2	gh-106931: Intern Statically Allocated Strings Globally (gh-107272) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.	2023-07-27 13:56:59 -06:00

1 2 3 4 5 ...

1685 Commits