Commit Graph

1737 Commits

Author SHA1 Message Date
Petr Viktorin 956270d08d
gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358)
Older stable ABI extensions are allowed to make immortal objects mortal.
Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`).
2024-07-16 15:17:29 +02:00
Bénédikt Tran 6343486eb6
gh-121165: protect macro expansion of `ADJUST_INDICES` with do-while(0) (#121166) 2024-07-02 16:27:51 +05:30
Victor Stinner 12af8ec864
gh-121040: Use __attribute__((fallthrough)) (#121044)
Fix warnings when using -Wimplicit-fallthrough compiler flag.

Annotate explicitly "fall through" switch cases with a new
_Py_FALLTHROUGH macro which uses __attribute__((fallthrough)) if
available. Replace "fall through" comments with _Py_FALLTHROUGH.

Add _Py__has_attribute() macro. No longer define __has_attribute()
macro if it's not defined. Move also _Py__has_builtin() at the top
of pyport.h.

Co-Authored-By: Nikita Sobolev <mail@sobolevn.me>
2024-06-27 09:58:44 +00:00
Xie Yanbo 0153fd0940
Fix typos in comments (#120821) 2024-06-24 19:47:00 +02:00
Steve Dower e731554337
Fixes loop variables to be the same types as their limit (GH-120958) 2024-06-24 17:11:47 +01:00
Victor Stinner 2e157851e3
gh-119182: Add PyUnicodeWriter_WriteUCS4() function (#120849) 2024-06-24 17:40:39 +02:00
Serhiy Storchaka 6eb23b1311
gh-70278: Fix PyUnicode_FromFormat() with precision for %s and %V (GH-120365)
PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
2024-06-24 18:07:07 +03:00
Victor Stinner e213475495
gh-119182: Add checks to PyUnicodeWriter APIs (#120870) 2024-06-22 17:25:55 +02:00
Victor Stinner 879d1f28bb
gh-119182: Use PyUnicodeWriter_WriteWideChar() (#120851)
Use PyUnicodeWriter_WriteWideChar() in PyUnicode_FromFormat()
2024-06-22 08:58:22 +02:00
Victor Stinner 4123226bbd
gh-119182: Add PyUnicodeWriter_DecodeUTF8Stateful() (#120639)
Add PyUnicodeWriter_WriteWideChar() and
PyUnicodeWriter_DecodeUTF8Stateful() functions.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-06-21 19:33:15 +02:00
Petr Viktorin 6f1d448bc1
gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520)
* Add an InternalDocs file describing how interning should work and how to use it.

* Add internal functions to *explicitly* request what kind of interning is done:
  - `_PyUnicode_InternMortal`
  - `_PyUnicode_InternImmortal`
  - `_PyUnicode_InternStatic`

* Switch uses of `PyUnicode_InternInPlace` to those.

* Disallow using `_Py_SetImmortal` on strings directly.
  You should use `_PyUnicode_InternImmortal` instead:
  - Strings should be interned before immortalization, otherwise you're possibly
    interning a immortalizing copy.
  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
    backports, as they are now part of public API and version-specific ABI.

* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
  - `_Py_ID`
  - `_Py_STR` (including the empty string)
  - one-character latin-1 singletons

  Now, when you intern a singleton, that exact singleton will be interned.

* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

* Intern `_Py_STR` singletons at startup.

* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.

* Beef up the tests. Cover internal details (marked with `@cpython_only`).

* Add lots of assertions

Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
2024-06-21 17:19:31 +02:00
Victor Stinner 5150795b1c
gh-119182: Optimize PyUnicode_FromFormat() (#120796)
Use strchr() and ucs1lib_find_max_char() to optimize the code path
formatting sub-strings between '%' formats.
2024-06-20 19:06:16 +00:00
Victor Stinner 5c4235cd8c
gh-119182: Add PyUnicodeWriter C API (#119184) 2024-06-17 17:10:52 +02:00
Eric Snow 105f22ea46
gh-117398: Use Per-Interpreter State for the _datetime Static Types (gh-119929)
We make use of the same mechanism that we use for the static builtin types.  This required a few tweaks.

The relevant code could use some cleanup but I opted to avoid the significant churn in this change.  I'll tackle that separately.

This change is the final piece needed to make _datetime support multiple interpreters.  I've updated the module slot accordingly.
2024-06-03 17:09:18 -06:00
Victor Stinner 3ea9b92086
gh-119396: Optimize unicode_decode_utf8_writer() (#119957)
Optimize unicode_decode_utf8_writer()

Take the ascii_decode() fast-path even if dest is not aligned on
size_t bytes.
2024-06-03 08:45:20 +02:00
Donghee Na 0594a27e5f
gh-117657: Fix data races report by TSAN unicode-hash (gh-119907) 2024-06-03 12:22:41 +09:00
Victor Stinner 0518edc170
gh-119396: Optimize unicode_repr() (#119617)
Use stringlib to specialize unicode_repr() for each string kind
(UCS1, UCS2, UCS4).

Benchmark:

+-------------------------------------+---------+----------------------+
| Benchmark                           | ref     | change2              |
+=====================================+=========+======================+
| repr('abc')                         | 100 ns  | 103 ns: 1.02x slower |
+-------------------------------------+---------+----------------------+
| repr('a' * 100)                     | 369 ns  | 369 ns: 1.00x slower |
+-------------------------------------+---------+----------------------+
| repr(('a' + squote) * 100)          | 1.21 us | 946 ns: 1.27x faster |
+-------------------------------------+---------+----------------------+
| repr(('a' + nl) * 100)              | 1.23 us | 907 ns: 1.36x faster |
+-------------------------------------+---------+----------------------+
| repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster |
+-------------------------------------+---------+----------------------+
| Geometric mean                      | (ref)   | 1.16x faster         |
+-------------------------------------+---------+----------------------+
2024-05-28 18:05:20 +02:00
Serhiy Storchaka b313cc68d5
gh-117557: Improve error messages when a string, bytes or bytearray of length 1 are expected (GH-117631) 2024-05-28 12:01:37 +03:00
Serhiy Storchaka 08e65430aa
gh-111999: Fix the signature of str.format_map() (#119540) 2024-05-25 06:21:11 -07:00
Victor Stinner 9b422fc6af
gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder (#119398)
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
2024-05-22 21:05:26 +00:00
Josh {*()} Rosenberg baf347d916
gh-119247: Add macros to use PySequence_Fast safely in free-threaded build (#119315)
Add `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST` and
`Py_END_CRITICAL_SECTION_SEQUENCE_FAST` macros and update `str.join` to use
them. Also add a regression test that would crash reliably without this
patch.
2024-05-22 17:45:34 +00:00
Brett Simmers c2627d6eea
gh-116322: Add Py_mod_gil module slot (#116882)
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.

PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.

A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
2024-05-03 11:30:55 -04:00
Brett Simmers f8290df63f
gh-116738: Make `_codecs` module thread-safe (#117530)
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:

- Move codecs-related state that lives on `PyInterpreterState` to a
  struct declared in `pycore_codecs.h`.

- In free-threaded builds, add a mutex to `codecs_state` to synchronize
  operations on `search_path`. Because `search_path_mutex` is used as a
  normal mutex and not a critical section, we must be extremely careful
  with operations called while holding it.

- The codec registry is explicitly initialized as part of
  `_PyUnicode_InitEncodings` to simplify thread-safety.
2024-05-02 18:25:36 -04:00
Erlend E. Aasland 044dc496e0
gh-117709: Add vectorcall support for str() with positional-only arguments (#117746)
Fall back to tp_call() for cases when arguments are passed by name.

Co-authored-by: Donghee Na <donghee.na@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
2024-04-11 13:55:37 +00:00
Serhiy Storchaka 24a2bd0481
gh-117642: Fix PEP 737 implementation (GH-117643)
* Fix implementation of %#T and %#N (they were implemented as %T# and
  %N#).
* Restore tests removed in gh-116417.
2024-04-08 16:27:25 +00:00
Sam Gross 1a6594f661
gh-117439: Make refleak checking thread-safe without the GIL (#117469)
This keeps track of the per-thread total reference count operations in
PyThreadState in the free-threaded builds. The count is merged into the
interpreter's total when the thread exits.
2024-04-08 12:11:36 -04:00
Erlend E. Aasland 7ecd55d604
gh-117431: Adapt str.find and friends to Argument Clinic (#117468)
This change gives a significant speedup, as the METH_FASTCALL calling
convention is now used. The following methods are adapted:

- str.count
- str.find
- str.index
- str.rfind
- str.rindex
2024-04-03 17:59:18 +02:00
Erlend E. Aasland 1dc1521042
gh-117431: Fix str.endswith docstring (#117499)
The first parameter is named 'suffix', not 'prefix'.

Regression introduced by commit 444156ed
2024-04-03 12:33:20 +02:00
Erlend E. Aasland 444156ede4
gh-117431: Adapt str.startswith and str.endswith to Argument Clinic (#117466)
This change gives a significant speedup, as the METH_FASTCALL calling
convention is now used.
2024-04-03 09:11:39 +02:00
Sam Gross 60e105c1c1
gh-113964: Don't prevent new threads until all non-daemon threads exit (#116677)
Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.

This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.
2024-03-19 14:40:20 -04:00
Victor Stinner 7bbb9b57e6
gh-111696, PEP 737: Add %T and %N to PyUnicode_FromFormat() (#116839) 2024-03-14 22:23:00 +00:00
Sam Gross ef3ceab09d
gh-112066: Use `PyDict_SetDefaultRef` in place of `PyDict_SetDefault`. (#112211)
This changes a number of internal usages of `PyDict_SetDefault` to use `PyDict_SetDefaultRef`.

Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
2024-02-07 13:43:18 -05:00
Erlend E. Aasland 53d921ed96
gh-114569: Use PyMem_* APIs for non-PyObjects in unicodeobject.c (#114690) 2024-01-29 21:48:49 +01:00
Donghee Na 8f5b998706
gh-111971: Make _PyUnicode_FromId thread-safe in --disable-gil (gh-113489) 2023-12-26 16:48:33 +00:00
Erlend E. Aasland 526d0a9b6e
gh-110383: Improve accuracy of str.split() and str.rsplit() docstrings (#113355)
Clarify split direction in the docstring body,
instead of in the 'maxsplit' param docstring.
2023-12-21 15:22:39 +01:00
Sam Gross cf6110ba13
gh-111924: Use PyMutex for Runtime-global Locks. (gh-112207)
This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize.

This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
2023-12-07 12:33:40 -07:00
Kirill Podoprigora 0785c68559
gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249) 2023-11-30 11:12:49 +01:00
Serhiy Storchaka 1d75ef6b61
gh-111999: Add signatures and improve docstrings for builtins (GH-112000) 2023-11-13 09:13:49 +02:00
Serhiy Storchaka 771bd3c94a
Add private _PyUnicode_AsUTF8NoNUL() function (GH-111957)
Like PyUnicode_AsUTF8(), but check for embedded null characters.
2023-11-10 21:31:36 +02:00
Victor Stinner 11e83488c5
gh-111089: Revert PyUnicode_AsUTF8() changes (#111833)
* Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)"

This reverts commit d9b606b3d0.

* Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)"

This reverts commit cde1071b2a.

* Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)"

This reverts commit d731579bfb.

* Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)"

This reverts commit d8f32be5b6.

* Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)"

This reverts commit 37e4e20eaa.
2023-11-07 22:36:13 +00:00
Sam Gross 6dfb8fe023
gh-110481: Implement biased reference counting (gh-110764) 2023-10-30 16:06:09 +00:00
Victor Stinner f1e751e933
gh-111089: PyUnicode_AsUTF8AndSize() sets size on error (#111106)
On error, PyUnicode_AsUTF8AndSize() now sets the size argument to -1,
to avoid undefined value.
2023-10-20 20:03:11 +02:00
Victor Stinner d731579bfb
gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)
* PyUnicode_AsUTF8() now raises an exception if the string contains
  embedded null characters.
* Update related C API tests (test_capi.test_unicode).
* type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently
  truncate doc containing null bytes.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-10-20 17:59:29 +02:00
Serhiy Storchaka eb50cd37ea
gh-110289: C API: Add PyUnicode_EqualToUTF8() and PyUnicode_EqualToUTF8AndSize() functions (GH-110297) 2023-10-11 16:41:58 +03:00
Victor Stinner 8b626a47ba
gh-110079: Remove extern "C" { ...} in C code (#110080) 2023-09-29 10:56:49 +02:00
Sam Gross 2aceb21ae6
gh-109693: Remove pycore_atomic_funcs.h (#109694)
_PyUnicode_FromId() now uses pyatomic.h functions instead.
2023-09-21 22:57:20 +02:00
Daniel Weiss e7d5433f94
gh-108915: Removes extra backslashes in str.split docstring (#109044) 2023-09-07 05:33:51 +00:00
Serhiy Storchaka 2b15536fa9
gh-107913: Fix possible losses of OSError error codes (GH-107930)
Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be
called immediately after using the C API which sets errno or the Windows
error code.
2023-08-27 00:35:06 +03:00
Victor Stinner b32d4cad15
gh-108444: Replace _PyLong_AsInt() with PyLong_AsInt() (#108459)
Change generated by the command:

sed -i -e 's!_PyLong_AsInt!PyLong_AsInt!g' \
    $(find -name "*.c" -o -name "*.h")
2023-08-25 01:01:30 +02:00
Victor Stinner c494fb333b
gh-106320: Remove private _PyEval function (#108433)
Move private _PyEval functions to the internal C API
(pycore_ceval.h):

* _PyEval_GetBuiltin()
* _PyEval_GetBuiltinId()
* _PyEval_GetSwitchInterval()
* _PyEval_MakePendingCalls()
* _PyEval_SetProfile()
* _PyEval_SetSwitchInterval()
* _PyEval_SetTrace()

No longer export most of these functions.
2023-08-24 20:25:22 +02:00
Brandt Bucher 05a824f294
GH-84436: Skip refcounting for known immortals (GH-107605) 2023-08-04 16:24:50 -07:00
Eric Snow b72947a8d2
gh-106931: Intern Statically Allocated Strings Globally (gh-107272)
We tried this before with a dict and for all interned strings.  That ran into problems due to interpreter isolation.  However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning.  Here we circle back to using a global cache, but only for statically allocated strings.  We also use a more-basic _Py_hashtable_t for that global cache instead of a dict.

Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter.  Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.
2023-07-27 13:56:59 -06:00
Eric Snow 0ba07b2108
gh-105699: Fix a Crasher Related to a Deprecated Global Variable (gh-106923)
There was a slight race in _Py_ClearFileSystemEncoding() (when called from _Py_SetFileSystemEncoding()), between freeing the value and setting the variable to NULL, which occasionally caused crashes when multiple isolated interpreters were used.  (Notably, I saw at least 10 different, seemingly unrelated spooky-action-at-a-distance, ways this crashed. Yay, free threading!)  We avoid the problem by only setting the global variables with the main interpreter (i.e. runtime init).
2023-07-21 08:34:09 -06:00
Eric Snow 87e7cb09e4
gh-105699: Fix an Interned Strings Crasher (gh-106930)
A static (process-global) str object must only have its "interned" state cleared when no longer interned in any interpreters.  They are the only ones that can be shared by interpreters so we don't have to worry about any other str objects.

We trigger clearing the state with the main interpreter, since no other interpreters may exist at that point and _PyUnicode_ClearInterned() is only called during interpreter finalization.

We do not address here the fact that a string will only be interned in the first interpreter that interns it.  In any subsequent interpreters str.state.interned is already set so _PyUnicode_InternInPlace() will skip it.  That needs to be addressed separately from fixing the crasher.
2023-07-21 08:32:42 -06:00
Hugo van Kemenade 34c14147a2
gh-106487: Allow the 'count' argument of `str.replace` to be a keyword (#106488) 2023-07-10 12:52:36 +03:00
Victor Stinner c5afc97fc2
gh-106320: Remove private _PyErr C API functions (#106356)
Remove private _PyErr C API functions: move them to the internal
C API (pycore_pyerrors.h).
2023-07-03 10:48:50 +00:00
Inada Naoki d5bd32fb48
gh-104922: remove PY_SSIZE_T_CLEAN (#106315) 2023-07-02 15:07:46 +09:00
Victor Stinner 18b1fdebe0
gh-106320: Remove _PyInterpreterState_Get() alias (#106321)
Replace calls to the (removed) slow _PyInterpreterState_Get() with
fast inlined _PyInterpreterState_GET() function.
2023-07-01 23:44:07 +00:00
Victor Stinner 0b51463862
Remove private _PyCodec_Lookup() function (#106269)
Remove the following private functions of the C API:

* _PyCodecInfo_GetIncrementalDecoder()
* _PyCodecInfo_GetIncrementalEncoder()
* _PyCodec_DecodeText()
* _PyCodec_EncodeText()
* _PyCodec_Forget()
* _PyCodec_Lookup()
* _PyCodec_LookupTextEncoding()

Move these functions to a new pycore_codecs.h internal header file.

These functions are no longer exported.
2023-06-30 09:34:01 +00:00
Erlend E. Aasland 555be81026
gh-105375: Improve error handling in PyUnicode_BuildEncodingMap() (#105491)
Bail on first error to prevent exceptions from possibly being overwritten.
2023-06-11 21:29:19 +02:00
Victor Stinner 8ed705c083
gh-105156: Deprecate the old Py_UNICODE type in C API (#105157)
Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API:
use wchar_t instead.

Replace Py_UNICODE with wchar_t in multiple C files.

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2023-06-01 08:56:35 +02:00
Inada Naoki e92ac0a741
Fix compiler warning in unicodeobject.c (#105050) 2023-05-29 17:31:03 +09:00
Serhiy Storchaka f3466bc040
gh-98836: Extend PyUnicode_FromFormat() (GH-98838)
* Support for conversion specifiers o (octal) and X (uppercase hexadecimal).
* Support for length modifiers j (intmax_t) and t (ptrdiff_t).
* Length modifiers are now applied to all integer conversions.
* Support for wchar_t C strings (%ls and %lV).
* Support for variable width and precision (*).
* Support for flag - (left alignment).
2023-05-22 00:32:39 +03:00
John Belmonte 69621d1b09
gh-104018: remove unused format "z" handling in string formatfloat() (#104107)
This is a cleanup overlooked in PR #104033.
2023-05-07 10:11:42 +05:30
Eric Snow a9c6e0618f
gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205)
Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules.  We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
2023-05-05 21:11:27 +00:00
Eric Snow fdd878650d
gh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (gh-104072)
Until now, we haven't been initializing nor finalizing the per-interpreter state properly.
2023-05-01 19:36:00 -06:00
Eric Snow d2e2e53f73
gh-94673: Ensure Builtin Static Types are Readied Properly (gh-103940)
There were cases where we do unnecessary work for builtin static types. This also simplifies some work necessary for a per-interpreter GIL.
2023-04-27 16:19:43 -06:00
Eddie Elizondo ea2c001650
gh-84436: Implement Immortal Objects (gh-19474)
This is the implementation of PEP683

Motivation:

The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.

Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
2023-04-22 13:39:37 -06:00
Eric Snow ba65a065cf
gh-100227: Move the Dict of Interned Strings to PyInterpreterState (gh-102339)
We can revisit the options for keeping it global later, if desired.  For now the approach seems quite complex, so we've gone with the simpler isolation solution in the meantime.

https://github.com/python/cpython/issues/100227
2023-03-28 12:52:28 -06:00
Eric Snow 89e67ada69
gh-100227: Revert gh-102925 "gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters" (gh-103063)
This reverts commit 87be8d9.

This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
2023-03-27 16:53:05 -06:00
Eric Snow 87be8d9522
gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters (gh-102925)
This is effectively two changes.  The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.).  The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly.

Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict).  _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters.  This is especially important as we move the obmalloc state to each interpreter (gh-101660), as well as, potentially, the GIL (PEP 684).

https://github.com/python/cpython/issues/100227
2023-03-22 18:30:04 -06:00
Kumar Aditya 3d872a74c8
GH-100227: cleanup initialization of global interned dict (#102682) 2023-03-14 14:22:21 +05:30
Eric Snow cbb0aa71d0
gh-102304: Consolidate Direct Usage of _Py_RefTotal (gh-102514)
This simplifies further changes to _Py_RefTotal (e.g. make it atomic or move it to PyInterpreterState).

https://github.com/python/cpython/issues/102304
2023-03-08 12:03:50 -07:00
Jelle Zijlstra 8d0f09b1be
gh-101765: unicodeobject: use Py_XDECREF correctly (#102283) 2023-02-26 14:45:37 -08:00
Jelle Zijlstra d71edbd1b7
gh-101765: Fix refcount issues in list and unicode pickling (#102265)
Followup from #101769.
2023-02-25 16:01:58 -08:00
Ionite 54dfa14c5a
gh-101765: Fix SystemError / segmentation fault in iter `__reduce__` when internal access of `builtins.__dict__` exhausts the iterator (#101769) 2023-02-24 15:02:04 -08:00
Eric Snow aa8591e9ca
gh-90111: Minor Cleanup for Runtime-Global Objects (gh-100254)
* move _PyRuntime.global_objects.interned to _PyRuntime.cached_objects.interned_strings (and use _Py_CACHED_OBJECT())
* rename _PyRuntime.global_objects to _PyRuntime.static_objects

(This also relates to gh-96075.)

https://github.com/python/cpython/issues/90111
2022-12-14 11:53:57 -07:00
Eric Snow 91a8e002c2
gh-81057: Move More Globals to _PyRuntimeState (gh-100092)
https://github.com/python/cpython/issues/81057
2022-12-07 15:56:31 -07:00
Serhiy Storchaka a87c46eab3
bpo-15999: Accept arbitrary values for boolean parameters. (#15609)
builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
2022-12-03 11:52:21 -08:00
Serhiy Storchaka f08e52ccb0
gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (GH-99613)
Previously *consumed was not set in this case.
2022-12-01 14:54:51 +02:00
Victor Stinner 135ec7cefb
gh-99537: Use Py_SETREF() function in C code (#99657)
Fix potential race condition in code patterns:

* Replace "Py_DECREF(var); var = new;" with "Py_SETREF(var, new);"
* Replace "Py_XDECREF(var); var = new;" with "Py_XSETREF(var, new);"
* Replace "Py_CLEAR(var); var = new;" with "Py_XSETREF(var, new);"

Other changes:

* Replace "old = var; var = new; Py_DECREF(var)"
  with "Py_SETREF(var, new);"
* Replace "old = var; var = new; Py_XDECREF(var)"
  with "Py_XSETREF(var, new);"
* And remove the "old" variable.
2022-11-22 13:39:11 +01:00
Eric Snow 5f55067e23
gh-81057: Move More Globals in Core Code to _PyRuntimeState (gh-99516)
https://github.com/python/cpython/issues/81057
2022-11-16 09:37:14 -07:00
Victor Stinner 1960eb005e
gh-99300: Use Py_NewRef() in Objects/ directory (#99351)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in C files of the Objects/ directory.
2022-11-10 23:40:31 +01:00
Eric Snow 52f91c642b
gh-90868: Adjust the Generated Objects (gh-99223)
We do the following:

* move the generated _PyUnicode_InitStaticStrings() to its own file
* move the generated _PyStaticObjects_CheckRefcnt() to its own file
* include pycore_global_objects.h in extension modules instead of pycore_runtime_init.h

These changes help us avoid including things that aren't needed.

https://github.com/python/cpython/issues/90868
2022-11-08 10:03:03 -07:00
Nikita Sobolev 76f989dc3e
gh-98783: Fix crashes when `str` subclasses are used in `_PyUnicode_Equal` (#98806) 2022-10-30 02:23:20 -04:00
Victor Stinner db03c8066a
gh-98393: os module reject bytes-like, only accept bytes (#98394)
The os module and the PyUnicode_FSDecoder() function no longer accept
bytes-like paths, like bytearray and memoryview types: only the exact
bytes type is accepted for bytes strings.
2022-10-18 17:52:31 +02:00
Nikita Sobolev ccab67ba79
gh-97982: Factorize PyUnicode_Count() and unicode_count() code (#98025)
Add unicode_count_impl() to factorize PyUnicode_Count()
and unicode_count() code.
2022-10-12 18:27:53 +02:00
Victor Stinner df3a6d9beb
gh-97982: Remove asciilib_count() (#98164)
asciilib_count() is the same than ucs1lib_count(): the code is not
specialized for ASCII strings, so it's not worth it to have a
separated function. Remove asciilib_count() function.
2022-10-11 17:59:58 +02:00
Kumar Aditya 6dab8c95bd
GH-96458: Statically initialize utf8 representation of static strings (#96481) 2022-09-02 23:43:08 -07:00
Kumar Aditya 129998bd7b
GH-96075: move interned dict under runtime state (GH-96077) 2022-08-22 12:05:21 -07:00
Petr Viktorin 71c3d649b5
gh-95504: Fix negative numbers in PyUnicode_FromFormat (GH-95848)
Co-authored-by: philg314 <110174000+philg314@users.noreply.github.com>
2022-08-10 13:12:40 +02:00
Serhiy Storchaka 62f06508e7
gh-95781: More strict format string checking in PyUnicode_FromFormatV() (GH-95784)
An unrecognized format character in PyUnicode_FromFormat() and
PyUnicode_FromFormatV() now sets a SystemError.
In previous versions it caused all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
2022-08-08 19:21:07 +03:00
Dong-hee Na fb75d015f4
gh-91146: More reduce allocation size of list from str.split/rsplit (gh-95493)
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2022-08-01 22:15:07 +09:00
Dong-hee Na 50b2261bda
gh-91146: Reduce allocation size of list from str.split()/rsplit() (gh-95473) 2022-07-31 12:14:53 +09:00
Pamela Fox 70068b9336
Fix Unicode doc and replace use of macro with PyMem_New function (GH-94088) 2022-07-28 23:32:16 +01:00
Eric Snow 4a1dd73431
gh-94673: Add _PyStaticType_InitBuiltin() (#95152)
This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types.

We do the following:

* add `_PyStaticType_InitBuiltin()`
* add `_Py_TPFLAGS_STATIC_BUILTIN`
* set it on all static builtin types in `_PyStaticType_InitBuiltin()`
* shuffle some code around to be able to use _PyStaticType_InitBuiltin()
    * rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()`
    * add `_PyStructSequence_InitBuiltin()`.
2022-07-25 12:47:31 -06:00
Kumar Aditya 9dff9f4814
GH-90699: Intern statically allocated strings (GH-93597)
This is similar to how strings are interned for deepfreeze.
2022-07-08 10:47:37 -07:00
Eric Snow caa279d6fd
bpo-40514: Drop EXPERIMENTAL_ISOLATED_SUBINTERPRETERS (gh-93185)
This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing:

* the configure option
* the macro
* the code enabled by the macro
2022-05-27 17:38:01 -06:00
Kumar Aditya cb04a09d2d
GH-93207: Remove HAVE_STDARG_PROTOTYPES configure check for stdarg.h (#93215) 2022-05-27 13:30:45 +02:00
Victor Stinner 5f8c3fb997
gh-91924: Optimize unicode_check_encoding_errors() (#93200)
Avoid _PyCodec_Lookup() and PyCodec_LookupError() for most common
built-in encodings and error handlers to avoid creating a temporary
Unicode string object, whereas these encodings and error handlers are
known to be valid.
2022-05-27 00:39:49 +02:00