Optimize `LOAD_FAST` opcodes into faster versions that load borrowed references onto the operand stack when we can prove that the lifetime of the local outlives the lifetime of the temporary that is loaded onto the stack.
Use the standard `__ARM_ARCH` macro, which is supported by GCC and Clang.
The branching logic for of `__ARMEL__` has been removed so if the target
architecture supports v7+ instructions, a yield is emitted, otherwise a nop
is emitted. This covers both big and little endian scenarios.
Signed-off-by: Vincent Fazio <vfazio@gmail.com>
In the free threaded build, the `_PyObject_LookupSpecial()` call can lead to
reference count contention on the returned function object becuase it
doesn't use stackrefs. Refactor some of the callers to use
`_PyObject_MaybeCallSpecialNoArgs`, which uses stackrefs internally.
This fixes the scaling bottleneck in the "lookup_special" microbenchmark
in `ftscalingbench.py`. However, the are still some uses of
`_PyObject_LookupSpecial()` that need to be addressed in future PRs.
Concurrent accesses from multiple threads to the same `cell` object did not
scale well in the free-threaded build. Use `_PyStackRef` and optimistically
avoid locking to improve scaling.
With the locks around cell reads gone, some of the free threading tests were
prone to starvation: the readers were able to run in a tight loop and the
writer threads weren't scheduled frequently enough to make timely progress.
Adjust the tests to avoid this.
Co-authored-by: Donghee Na <donghee.na@python.org>
* Rename 'defined' attribute to 'in_local' to more accurately reflect how it is used
* Make death of variables explicit even for array variables.
* Convert in_memory from boolean to stack offset
* Don't apply liveness analysis to optimizer generated code
* Fix RETURN_VALUE in optimizer
This tells TSAN not to sanitize `PyUnstable_InterpreterFrame_GetLine()`.
There's a possible data race on the access to the frame's `instr_ptr`
if the frame is currently executing. We don't really care about the
race. In theory, we could use relaxed atomics for every access to
`instr_ptr`, but that would create more code churn and current compilers
are overly conservative with optimizations around relaxed atomic
accesses.
We also don't sanitize `_PyFrame_IsIncomplete()` because it accesses
`instr_ptr` and is called from assertions within PyFrame_GetCode().
- Restore max field size to sys.maxsize, as in Python 3.13 & below
- PyCField: Split out bit/byte sizes/offsets.
- Expose CField's size/offset data to Python code
- Add generic checks for all the test structs/unions, using the newly exposed attrs
* Move _Py_VISIT_STACKREF() from pycore_gc.h to pycore_stackref.h.
* Remove pycore_interpframe.h include from pycore_genobject.h.
* Remove now useless includes from C files.
* Add pycore_interpframe_structs.h to Makefile.pre.in and
pythoncore.vcxproj.
Move pycore_obmalloc.h include from pycore_interp_structs.h to
pycore_runtime_structs.h.
Add also comment explaining the purpose of each include in
pycore_interp_structs.h, pycore_runtime_structs.h and
pycore_structs.h.
Remove <stdbool.h> and <stddef.h> from pycore_structs.h.
Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.
The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.
The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
* Fix use after free in list objects
Set the items pointer in the list object to NULL after the items array
is freed during list deallocation. Otherwise, we can end up with a list
object added to the free list that contains a pointer to an already-freed
items array.
* Mark `_PyList_FromStackRefStealOnSuccess` as escaping
I think technically it's not escaping, because the only object that
can be decrefed if allocation fails is an exact list, which cannot
execute arbitrary code when it is destroyed. However, this seems less
intrusive than trying to special cases objects in the assert in `_Py_Dealloc`
that checks for non-null stackpointers and shouldn't matter for performance.
* Add location information when accessing already closed stackref
* Add #def option to track closed stackrefs to provide precise information for use after free and double frees.
Disable pedantic check for c++03 (unlimited API)
Also add a check for c++03 *limited* API, which passes in pedantic mode
after removing a comma in the `PySendResult` declaration, and allowing
`long long`.
* Combine _GUARD_GLOBALS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_MODULE_FROM_KEYS into _LOAD_GLOBAL_MODULE
* Combine _GUARD_BUILTINS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_BUILTINS_FROM_KEYS into _LOAD_GLOBAL_BUILTINS
* Combine _CHECK_ATTR_MODULE_PUSH_KEYS and _LOAD_ATTR_MODULE_FROM_KEYS into _LOAD_ATTR_MODULE
* Remove stack transient in LOAD_ATTR_WITH_HINT
Move deprecated PyUnicode API docs to new section
Move Py_UNICODE to a new "Deprecated API" section.
Formally soft-deprecate PyUnicode_READY, and move it
Document and soft-deprecate PyUnicode_IS_READY, and move it
Document PyUnicode_IS_ASCII, PyUnicode_CHECK_INTERNED
PyUnicode_New docs: Clarify requirements for "fresh" strings
PyUnicodeWriter_DecodeUTF8Stateful: Link "error-handlers"
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Windows and macOS require precomputing a "timebase" in order to convert
OS timestamps into nanoseconds. Retrieve and compute this value during
runtime initialization to avoid data races when accessing the time.
The `free_work_item()` function in QSBR may call arbitrary code via
Python object destructors, which may reenter the QSBR code. Reorder
the processing of work items to be robust to reentrancy.
Also fix the TODO for the out of memory situation.
The use of PySys_GetObject() and _PySys_GetAttr(), which return a borrowed
reference, has been replaced by using one of the following functions, which
return a strong reference and distinguish a missing attribute from an error:
_PySys_GetOptionalAttr(), _PySys_GetOptionalAttrString(),
_PySys_GetRequiredAttr(), and _PySys_GetRequiredAttrString().
This fixes a fairly subtle bug involving finalizers and resurrection in
debug free threaded builds: if `_PyObject_ResurrectEnd` returns `1`
(i.e., the object was resurrected by a finalizer), it's not safe to
access the object because it might still be deallocated. For example:
* The finalizer may have exposed the object to another thread. That
thread may hold the last reference and concurrently deallocate it any
time after `_PyObject_ResurrectEnd()` returns `1`.
* `_PyObject_ResurrectEnd()` may call `_Py_brc_queue_object()`, which
may internally deallocate the object immediately if the owning thread
is dead.
Therefore, it's important not to access the object after it's
resurrected. We only violate this in two cases, and only in debug
builds:
* We assert that the object is tracked appropriately. This is now moved
up betewen the finalizer and the `_PyObject_ResurrectEnd()` call.
* The `--with-trace-refs` builds may need to remember the object if
it's resurrected. This is now handled by `_PyObject_ResurrectStart()`
and `_PyObject_ResurrectEnd()`.
Note that `--with-trace-refs` is currently disabled in `--disable-gil`
builds because the refchain hash table isn't thread-safe, but this
refactoring avoids an additional thread-safety issue.
* Implement C recursion protection with limit pointers for Linux, MacOS and Windows
* Remove calls to PyOS_CheckStack
* Add stack protection to parser
* Make tests more robust to low stacks
* Improve error messages for stack overflow
Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now
Unfortunatlely, the change broke some buildbots.
This reverts commit 2498c22fa0.
CPython current temporarily changes `PYMEM_DOMAIN_RAW` to the default
allocator during initialization and shutdown. The motivation is to
ensure that core runtime structures are allocated and freed using the
same allocator. However, modifying the current allocator changes global
state and is not thread-safe even with the GIL. Other threads may be
allocating or freeing objects use PYMEM_DOMAIN_RAW; they are not
required to hold the GIL to call PyMem_RawMalloc/PyMem_RawFree.
This adds new internal-only functions like `_PyMem_DefaultRawMalloc`
that aren't affected by calls to `PyMem_SetAllocator()`, so they're
appropriate for Python runtime initialization and finalization. Use
these calls in places where we previously swapped to the default raw
allocator.
Deprecate private C API functions:
* _PyUnicodeWriter_Init()
* _PyUnicodeWriter_Finish()
* _PyUnicodeWriter_Dealloc()
* _PyUnicodeWriter_WriteChar()
* _PyUnicodeWriter_WriteStr()
* _PyUnicodeWriter_WriteSubstring()
* _PyUnicodeWriter_WriteASCIIString()
* _PyUnicodeWriter_WriteLatin1String()
These functions are not deprecated in the internal C API (if the
Py_BUILD_CORE macro is defined).
* Implement C recursion protection with limit pointers
* Remove calls to PyOS_CheckStack
* Add stack protection to parser
* Make tests more robust to low stacks
* Improve error messages for stack overflow
* gh-129701: Fix a data race in `intern_common` in the free threaded build
* Use a mutex to avoid potentially returning a non-immortalized string,
because immortalization happens after the insertion into the interned
dict.
* Use `Py_DECREF()` calls instead of `Py_SET_REFCNT(s, Py_REFCNT(s) - 2)`
for thread-safety. This code path isn't performance sensistive, so
just use `Py_DECREF()` unconditionally for simplicity.
Use an atomic operation when setting
`_PyRuntime.signals.unhandled_keyboard_interrupt`. We now only clear the
variable at the start of `_PyRun_Main`, which is the same function where
we check it.
This avoids race conditions where previously another thread might call
`run_eval_code_obj()` and erroneously clear the unhandled keyboard
interrupt.