In the free-threaded build, avoid data races caused by updating type
slots or type flags after the type was initially created. For those
(typically rare) cases, use the stop-the-world mechanism. Remove the
use of atomics when reading or writing type flags.
In the free-threaded build, avoid data races caused by updating type slots
or type flags after the type was initially created. For those (typically
rare) cases, use the stop-the-world mechanism. Remove the use of atomics
when reading or writing type flags. The use of atomics is not sufficient to
avoid races (since flags are sometimes read without a lock and without
atomics) and are no longer required.
Two races related to the type lookup cache, when used in the
free-threaded build. This caused test_opcache to sometimes fail (as
well as other hard to re-produce failures).
In the free threaded build, the `_PyObject_LookupSpecial()` call can lead to
reference count contention on the returned function object becuase it
doesn't use stackrefs. Refactor some of the callers to use
`_PyObject_MaybeCallSpecialNoArgs`, which uses stackrefs internally.
This fixes the scaling bottleneck in the "lookup_special" microbenchmark
in `ftscalingbench.py`. However, the are still some uses of
`_PyObject_LookupSpecial()` that need to be addressed in future PRs.
The `PyType_HasFeature()` function reads the flags with a relaxed atomic
load and without holding the type lock. To avoid data races, use atomic
stores if `PyType_Ready()` has already been called.
Fix UBSan failures for `PyTypeObject`.
Introduce a macro cast for `superobject` and remove redundant casts.
Rename the unused parameter in getter/setter methods to `closure`
for semantic purposes.
The reference count fields, such as `ob_tid` and `ob_ref_shared`, may be
accessed concurrently in the free threading build by a `_Py_TryXGetRef`
or similar operation. The PyObject header fields will be initialized by
`_PyObject_Init`, so only call `memset()` to zero-initialize the remainder
of the allocation.
We should use a relaxed atomic load in the free threading build in
`PyType_Modified()` because that's called without the type lock held.
It's not necessary to use atomics in `type_modified_unlocked()`.
We should also use `FT_ATOMIC_STORE_UINT_RELAXED()` instead of the
`UINT32` variant because `tp_version_tag` is declared as `unsigned int`.
Fix a few thread-safety bugs to enable test_opcache when run with TSAN:
* Use relaxed atomics when clearing `ht->_spec_cache.getitem`
(gh-115999)
* Add temporary suppression for type slot modifications (gh-127266)
* Use atomic load when reading `*dictptr`
In the free threading build, the per thread reference counting uses a
unique id for some objects to index into the local reference count
table. Use 0 instead of -1 to indicate that the id is not assigned. This
avoids bugs where zero-initialized heap type objects look like they have
a unique id assigned.
The CALL family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below.
A few changes were needed to make CALL_ALLOC_AND_ENTER_INIT thread-safe:
Added _PyType_LookupRefAndVersion, which returns the type version corresponding to the returned ref.
Added _PyType_CacheInitForSpecialization, which takes an init method and the corresponding type version and only populates the specialization cache if the current type version matches the supplied version. This prevents potentially caching a stale value in free-threaded builds if we race with an update to __init__.
Only cache __init__ functions that are deferred in free-threaded builds. This ensures that the reference to __init__ that is stored in the specialization cache is valid if the type version guard in _CHECK_AND_ALLOCATE_OBJECT passes.
Fix a bug in _CREATE_INIT_FRAME where the frame is pushed to the stack on failure.
A few other miscellaneous changes were also needed:
Use {LOCK,UNLOCK}_OBJECT in LIST_APPEND. This ensures that the list's per-object lock is held while we are appending to it.
Add missing co_tlbc for _Py_InitCleanup.
Stop/start the world around setting the eval frame hook. This allows us to read interp->eval_frame non-atomically and preserves the behavior of _CHECK_PEP_523 documented below.
* Replace uses of `PyCell_GET` and `PyCell_SET`. These macros are not
safe to use in the free-threaded build. Use `PyCell_GetRef()` and
`PyCell_SetTakeRef()` instead.
* Since `PyCell_GetRef()` returns a strong rather than borrowed ref, some
code restructuring was required, e.g. `frame_get_var()` returns a strong
ref now.
* Add critical sections to `PyCell_GET` and `PyCell_SET`.
* Move critical_section.h earlier in the Python.h file.
* Add `PyCell_GET` to the free-threading howto table of APIs that return
borrowed refs.
* Add additional unit tests for free-threading.
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.
Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.
Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.
Rename some of the files and functions in preparation for this change.