The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.
The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.
The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
* Fix use after free in list objects
Set the items pointer in the list object to NULL after the items array
is freed during list deallocation. Otherwise, we can end up with a list
object added to the free list that contains a pointer to an already-freed
items array.
* Mark `_PyList_FromStackRefStealOnSuccess` as escaping
I think technically it's not escaping, because the only object that
can be decrefed if allocation fails is an exact list, which cannot
execute arbitrary code when it is destroyed. However, this seems less
intrusive than trying to special cases objects in the assert in `_Py_Dealloc`
that checks for non-null stackpointers and shouldn't matter for performance.
* Add location information when accessing already closed stackref
* Add #def option to track closed stackrefs to provide precise information for use after free and double frees.
Writing the decimal representation of a Unicode codepoint only requires to know the number of digits.
---------
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Move some `#include <stdbool.h>` after `#include "Python.h"` when `pyconfig.h` is not
included first and when we are in a platform-agnostic context. This is to avoid having
features defined by `stdbool.h` before those decided by `Python.h`.
* Combine _GUARD_GLOBALS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_MODULE_FROM_KEYS into _LOAD_GLOBAL_MODULE
* Combine _GUARD_BUILTINS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_BUILTINS_FROM_KEYS into _LOAD_GLOBAL_BUILTINS
* Combine _CHECK_ATTR_MODULE_PUSH_KEYS and _LOAD_ATTR_MODULE_FROM_KEYS into _LOAD_ATTR_MODULE
* Remove stack transient in LOAD_ATTR_WITH_HINT
Remove inclusions prior to Python.h.
<stdbool.h> will cause <features.h> to be included before Python.h can
define some macros to enable some additional features, causing multiple
types not to be defined down the line.
This moves `tstate_activate()` down to avoid a data race in the free
threading build on the `_PyRuntime`'s thread-local `autoTSSkey`. This
key is deleted during runtime finalization, which may happen
concurrently with a call to `_PyThreadState_Attach`.
The earlier `tstate_try/wait_attach` ensures that the thread is blocked
before it attempts to access the deleted `autoTSSkey`.
This fixes a TSAN reported data race in
`test_threading.test_import_from_another_thread`.
Windows and macOS require precomputing a "timebase" in order to convert
OS timestamps into nanoseconds. Retrieve and compute this value during
runtime initialization to avoid data races when accessing the time.
The use of PySys_GetObject() and _PySys_GetAttr(), which return a borrowed
reference, has been replaced by using one of the following functions, which
return a strong reference and distinguish a missing attribute from an error:
_PySys_GetOptionalAttr(), _PySys_GetOptionalAttrString(),
_PySys_GetRequiredAttr(), and _PySys_GetRequiredAttrString().
* Implement C recursion protection with limit pointers for Linux, MacOS and Windows
* Remove calls to PyOS_CheckStack
* Add stack protection to parser
* Make tests more robust to low stacks
* Improve error messages for stack overflow
Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now
Unfortunatlely, the change broke some buildbots.
This reverts commit 2498c22fa0.
As of iOS 18.3.1, enabling wasm-gc breaks the interpreter. This disables the wasm-gc
trampoline on iOS.
Co-authored-by: Hood Chatham <roberthoodchatham@gmail.com>
CPython current temporarily changes `PYMEM_DOMAIN_RAW` to the default
allocator during initialization and shutdown. The motivation is to
ensure that core runtime structures are allocated and freed using the
same allocator. However, modifying the current allocator changes global
state and is not thread-safe even with the GIL. Other threads may be
allocating or freeing objects use PYMEM_DOMAIN_RAW; they are not
required to hold the GIL to call PyMem_RawMalloc/PyMem_RawFree.
This adds new internal-only functions like `_PyMem_DefaultRawMalloc`
that aren't affected by calls to `PyMem_SetAllocator()`, so they're
appropriate for Python runtime initialization and finalization. Use
these calls in places where we previously swapped to the default raw
allocator.
* Implement C recursion protection with limit pointers
* Remove calls to PyOS_CheckStack
* Add stack protection to parser
* Make tests more robust to low stacks
* Improve error messages for stack overflow
Use an atomic operation when setting
`_PyRuntime.signals.unhandled_keyboard_interrupt`. We now only clear the
variable at the start of `_PyRun_Main`, which is the same function where
we check it.
This avoids race conditions where previously another thread might call
`run_eval_code_obj()` and erroneously clear the unhandled keyboard
interrupt.
The reference count fields, such as `ob_tid` and `ob_ref_shared`, may be
accessed concurrently in the free threading build by a `_Py_TryXGetRef`
or similar operation. The PyObject header fields will be initialized by
`_PyObject_Init`, so only call `memset()` to zero-initialize the remainder
of the allocation.
The `gc_get_refs` assertion needs to be after we check the alive and
unreachable bits. Otherwise, `ob_tid` may store the actual thread id
instead of the computed `gc_refs`, which may trigger the assertion if
the `ob_tid` looks like a negative value.
Also fix a few type warnings on 32-bit systems.
For the free-threaded version of the cyclic GC, restructure the "mark alive" phase to use software prefetch instructions. This gives a speedup in most cases when the number of objects is large enough. The prefetching is enabled conditionally based on the number of long-lived objects the GC finds.
Codegen phase has an optimization that transforms
```
LOAD_CONST x
LOAD_CONST y
LOAD_CONXT z
BUILD_LIST/BUILD_SET (3)
```
->
```
BUILD_LIST/BUILD_SET (0)
LOAD_CONST (x, y, z)
LIST_EXTEND/SET_UPDATE 1
```
This optimization has now been moved to CFG phase to make #128802 work.
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Co-authored-by: Yan Yanchii <yyanchiy@gmail.com>
The stack pointers in interpreter frames are nearly always valid now, so
use them when visiting each thread's frame. For now, don't collect
objects with deferred references in the rare case that we see a frame
with a NULL stack pointer.
* Remove all 'if (0)' and 'if (1)' conditional stack effects
* Use array instead of conditional for BUILD_SLICE args
* Refactor LOAD_GLOBAL to use a common conditional uop
* Remove conditional stack effects from LOAD_ATTR specializations
* Replace conditional stack effects in LOAD_ATTR with a 0 or 1 sized array.
* Remove conditional stack effects from CALL_FUNCTION_EX
Since tracemalloc uses PyMutex, it becomes safe to use TABLES_LOCK()
even after _PyTraceMalloc_Fini(): remove the "pre-check" in
PyTraceMalloc_Track() and PyTraceMalloc_Untrack().
PyTraceMalloc_Untrack() no longer needs to acquire the GIL.
_PyTraceMalloc_Fini() can be called earlier during Python
finalization.
This fixes how `PyCodec_BackslashReplaceErrors` handles the `start` and `end`
attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
Support calling PyTraceMalloc_Track() and PyTraceMalloc_Untrack()
during late Python finalization.
* Call _PyTraceMalloc_Fini() later in Python finalization.
* Test also PyTraceMalloc_Untrack() without the GIL
* PyTraceMalloc_Untrack() now gets the GIL.
* Test also PyTraceMalloc_Untrack() in test_tracemalloc_track_race().
This fixes how `PyCodec_XMLCharRefReplaceErrors` handles the `start` and `end`
attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
Replace the private _PyUnicodeWriter API with the public
PyUnicodeWriter API.
* Add append_char() function.
* Add APPEND_CHAR() and APPEND_CHAR_FINISH() macros.
* Replace APPEND_STR() and APPEND_STR_FINISH() of single character
with APPEND_CHAR() and APPEND_CHAR_FINISH().
Remove _PyInterpreterState_GetConfigCopy() and
_PyInterpreterState_SetConfig() private functions. PEP 741 "Python
Configuration C API" added a better public C API: PyConfig_Get() and
PyConfig_Set().
In the free threading build, the per thread reference counting uses a
unique id for some objects to index into the local reference count
table. Use 0 instead of -1 to indicate that the id is not assigned. This
avoids bugs where zero-initialized heap type objects look like they have
a unique id assigned.
tracemalloc_alloc(), tracemalloc_realloc(), tracemalloc_free(),
_PyTraceMalloc_TraceRef() and _PyTraceMalloc_GetMemory() now check
'tracemalloc_config.tracing' after calling TABLES_LOCK().
_PyTraceMalloc_TraceRef() now always returns 0.
* Use TABLES_LOCK() to protect 'tracemalloc_config.tracing'.
* Hold TABLES_LOCK() longer while accessing tables.
* tracemalloc_realloc() and tracemalloc_free() no longer
remove the trace on reentrant call.
* _PyTraceMalloc_Stop() unregisters _PyTraceMalloc_TraceRef().
* _PyTraceMalloc_GetTraces() sets the reentrant flag.
* tracemalloc_clear_traces_unlocked() sets the reentrant flag.
Replaces the trampoline mechanism in Emscripten with an implementation that uses a
recently added feature of wasm-gc instead of JS type reflection, when that feature is
available.
Add free-threaded specialization for COMPARE_OP, and tests for COMPARE_OP specialization in general.
Co-authored-by: Donghee Na <donghee.na92@gmail.com>