Make `warnings.catch_warnings()` use a context variable for holding
the warning filtering state if the `sys.flags.context_aware_warnings`
flag is set to true. This makes using the context manager thread-safe in
multi-threaded programs.
Add the `sys.flags.thread_inherit_context` flag. If true, starting a new
thread with `threading.Thread` will use a copy of the context
from the caller of `Thread.start()`.
Both these flags are set to true by default for the free-threaded build
and false for the default build.
Move the Python implementation of warnings.py into _py_warnings.py.
Make _contextvars a builtin module.
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Concurrent accesses from multiple threads to the same `cell` object did not
scale well in the free-threaded build. Use `_PyStackRef` and optimistically
avoid locking to improve scaling.
With the locks around cell reads gone, some of the free threading tests were
prone to starvation: the readers were able to run in a tight loop and the
writer threads weren't scheduled frequently enough to make timely progress.
Adjust the tests to avoid this.
Co-authored-by: Donghee Na <donghee.na@python.org>
Make tuple iteration more thread-safe, and actually test concurrent iteration of tuple, range and list. (This is prep work for enabling specialization of FOR_ITER in free-threaded builds.) The basic premise is:
Iterating over a shared iterable (list, tuple or range) should be safe, not involve data races, and behave like iteration normally does.
Using a shared iterator should not crash or involve data races, and should only produce items regular iteration would produce. It is not guaranteed to produce all items, or produce each item only once. (This is not the case for range iteration even after this PR.)
Providing stronger guarantees is possible for some of these iterators, but it's not always straight-forward and can significantly hamper the common case. Since iterators in general aren't shared between threads, and it's simply impossible to concurrently use many iterators (like generators), better to make sharing iterators without explicit synchronization clearly wrong.
Specific issues fixed in order to make the tests pass:
- List iteration could occasionally fail an assertion when a shared list was shrunk and an item past the new end was retrieved concurrently. There's still some unsafety when deleting/inserting multiple items through for example slice assignment, which uses memmove/memcpy.
- Tuple iteration could occasionally crash when the iterator's reference to the tuple was cleared on exhaustion. Like with list iteration, in free-threaded builds we can't safely and efficiently clear the iterator's reference to the iterable (doing it safely would mean extra, slow refcount operations), so just keep the iterable reference around.
The call to `PySequence_List()` could temporarily unlock and relock the
set, allowing the items to be cleared and return the incorrect
notation `{}` for a empty set (it should be `set()`).
Co-authored-by: T. Wouters <thomas@python.org>
The `dict.get` implementation uses `_Py_dict_lookup_threadsafe`, which is
thread-safe, so we remove the critical section from the argument clinic.
Add a test for concurrent dict get and set operations.
* Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and
use in place of `_PyDictKeys_StringLookup`.
* Change `_PyObject_TryGetInstanceAttribute` to use that function
in the case of split keys.
* Add `unicodekeys_lookup_split` helper which allows code sharing
between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`.
* Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create
`_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and
`tp_version_tag` cannot change.
* Pass `tp_version_tag` to `specialize_dict_access()`, ensuring
the version we store on the cache is the correct one (in case of
it changing during the specalize analysis).
* Split `analyze_descriptor` into `analyze_descriptor_load` and
`analyze_descriptor_store` since those don't share much logic.
Add `descriptor_is_class` helper function.
* In `specialize_dict_access`, double check `_PyObject_GetManagedDict()`
in case we race and dict was materialized before the lock.
* Avoid borrowed references in `_Py_Specialize_StoreAttr()`.
* Use `specialize()` and `unspecialize()` helpers.
* Add unit tests to ensure specializing happens as expected in FT builds.
* Add unit tests to attempt to trigger data races (useful for running under TSAN).
* Add `has_split_table` function to `_testinternalcapi`.
The function `operator.methodcaller` was not thread-safe since the additional
of the vectorcall method in gh-89013. In the free threading build the issue
is easy to trigger, for the normal build harder.
This makes the `methodcaller` safe by:
* Replacing the lazy initialization with initialization in the constructor.
* Using a stack allocated space for the vectorcall arguments and falling back
to `tp_call` for calls with more than 8 arguments.
* Replace uses of `PyCell_GET` and `PyCell_SET`. These macros are not
safe to use in the free-threaded build. Use `PyCell_GetRef()` and
`PyCell_SetTakeRef()` instead.
* Since `PyCell_GetRef()` returns a strong rather than borrowed ref, some
code restructuring was required, e.g. `frame_get_var()` returns a strong
ref now.
* Add critical sections to `PyCell_GET` and `PyCell_SET`.
* Move critical_section.h earlier in the Python.h file.
* Add `PyCell_GET` to the free-threading howto table of APIs that return
borrowed refs.
* Add additional unit tests for free-threading.
This fixes a crash when `gc.get_objects()` or `gc.get_referrers()` is
called during a GC in the free threading build.
Switch to `_PyObjectStack` to avoid corrupting the `struct worklist`
linked list maintained by the GC. Also, don't return objects that are frozen
(`gc.freeze()`) or in the process of being collected to more closely match
the behavior of the default build.
* Reduce the number of iterations and the number of threads so a
whole test file takes less than a minute.
* Refactor test_racing_iter_extend() to remove two levels of
indentation.
* test_monitoring() uses a sleep of 100 ms instead of 1 second.
The `zip_next` function uses a common optimization technique for methods
that generate tuples. The iterator maintains an internal reference to
the returned tuple. When the method is called again, it checks if the
internal tuple's reference count is 1. If so, the tuple can be reused.
However, this approach is not safe under the free-threading build:
after checking the reference count, another thread may perform the same
check and also reuse the tuple. This can result in a double decref on
the items of the replaced tuple and a double incref (memory leak) on
the items of the tuple being set.
This adds a function, `_PyObject_IsUniquelyReferenced` that
encapsulates the stricter logic necessary for the free-threaded build:
the internal tuple must be owned by the current thread, have a local
refcount of one, and a shared refcount of zero.
Add `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST` and
`Py_END_CRITICAL_SECTION_SEQUENCE_FAST` macros and update `str.join` to use
them. Also add a regression test that would crash reliably without this
patch.
Add _PyType_LookupRef and use incref before setting attribute on type
Makes setting an attribute on a class and signaling type modified atomic
Avoid adding re-entrancy exposing the type cache in an inconsistent state by decrefing after type is updated
Makes sys.settrace, sys.setprofile, and monitoring generally thread-safe.
Mostly uses a stop-the-world approach and synchronization around the code object's _co_instrumentation_version. There may be a little bit of extra synchronization around the monitoring data that's required to be TSAN clean.