They used to be shared, before 3.12. Returning to sharing them resolves a failure on Py_TRACE_REFS builds.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
* Fix usage of PyStackRef_FromPyObjectSteal in CALL_TUPLE_1
This was missed in gh-124894
* Fix usage of PyStackRef_FromPyObjectSteal in _CALL_STR_1
This was missed in gh-124894
* Regenerate code
This is essentially a cleanup, moving a handful of API declarations to the header files where they fit best, creating new ones when needed.
We do the following:
* add pycore_debug_offsets.h and move _Py_DebugOffsets, etc. there
* inline struct _getargs_runtime_state and struct _gilstate_runtime_state in _PyRuntimeState
* move struct _reftracer_runtime_state to the existing pycore_object_state.h
* add pycore_audit.h and move to it _Py_AuditHookEntry , _PySys_Audit(), and _PySys_ClearAuditHooks
* add audit.h and cpython/audit.h and move the existing audit-related API there
*move the perfmap/trampoline API from cpython/sysmodule.h to cpython/ceval.h, and remove the now-empty cpython/sysmodule.h
Users want to know when the current context switches to a different
context object. Right now this happens when and only when a context
is entered or exited, so the enter and exit events are synonymous with
"switched". However, if the changes proposed for gh-99633 are
implemented, the current context will also switch for reasons other
than context enter or exit. Since users actually care about context
switches and not enter or exit, replace the enter and exit events with
a single switched event.
The former exit event was emitted just before exiting the context.
The new switched event is emitted after the context is exited to match
the semantics users expect of an event with a past-tense name. If
users need the ability to clean up before the switch takes effect,
another event type can be added in the future. It is not added here
because YAGNI.
I skipped 0 in the enum as a matter of practice. Skipping 0 makes it
easier to troubleshoot when code forgets to set zeroed memory, and it
aligns with best practices for other tools (e.g.,
https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum).
Co-authored-by: Richard Hansen <rhansen@rhansen.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
This fixes a crash when running the PyO3 test suite on the free-threaded
build. The `qsbr` field is initialized after the `PyThreadState` is
added to the interpreter's linked list -- it might still be NULL.
Instead, we "steal" the queue of to-be-freed memory blocks. This is
always initialized (possibly empty) and protected by the stop the world
pause.
Users want to know when the current context switches to a different
context object. Right now this happens when and only when a context
is entered or exited, so the enter and exit events are synonymous with
"switched". However, if the changes proposed for gh-99633 are
implemented, the current context will also switch for reasons other
than context enter or exit. Since users actually care about context
switches and not enter or exit, replace the enter and exit events with
a single switched event.
The former exit event was emitted just before exiting the context.
The new switched event is emitted after the context is exited to match
the semantics users expect of an event with a past-tense name. If
users need the ability to clean up before the switch takes effect,
another event type can be added in the future. It is not added here
because YAGNI.
I skipped 0 in the enum as a matter of practice. Skipping 0 makes it
easier to troubleshoot when code forgets to set zeroed memory, and it
aligns with best practices for other tools (e.g.,
https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum).
When formatting the AST as a string, infinite values are replaced by
1e309, which evaluates to infinity. The initialization of this string
replacement was not thread-safe in the free threading build.
Each of the `LOAD_GLOBAL` specializations is implemented roughly as:
1. Load keys version.
2. Load cached keys version.
3. Deopt if (1) and (2) don't match.
4. Load keys.
5. Load cached index into keys.
6. Load object from (4) at offset from (5).
This is not thread-safe in free-threaded builds; the keys object may be replaced
in between steps (3) and (4).
This change refactors the specializations to avoid reloading the keys object and
instead pass the keys object from guards to be consumed by downstream uops.
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
* Make slices marshallable
* Emit slices as constants
* Update Python/marshal.c
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
* Refactor codegen_slice into two functions so it
always has the same net effect
* Fix for free-threaded builds
* Simplify marshal loading of slices
* Only return SUCCESS/ERROR from codegen_slice
---------
Co-authored-by: Mark Shannon <mark@hotpy.org>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Stop the world when invalidating function versions
The tier1 interpreter specializes `CALL` instructions based on the values
of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1
interpreter uses function versions to verify that the attributes of a function
during execution of a specialization match those seen during specialization.
A function's version is initialized in `MAKE_FUNCTION` and is invalidated when
any of the critical function attributes are changed. The tier1 interpreter stores
the function version in the inline cache during specialization. A guard is used by
the specialized instruction to verify that the version of the function on the operand
stack matches the cached version (and therefore has all of the expected attributes).
It is assumed that once the guard passes, all attributes will remain unchanged
while executing the rest of the specialized instruction.
Stopping the world when invalidating function versions ensures that all critical
function attributes will remain unchanged after the function version guard passes
in free-threaded builds. It's important to note that this is only true if the remainder
of the specialized instruction does not enter and exit a stop-the-world point.
We will stop the world the first time any of the following function attributes
are mutated:
- defaults
- vectorcall
- kwdefaults
- closure
- code
This should happen rarely and only happens once per function, so the performance
impact on majority of code should be minimal.
Additionally, refactor the API for manipulating function versions to more clearly
match the stated semantics.
* Spill the evaluation around escaping calls in the generated interpreter and JIT.
* The code generator tracks live, cached values so they can be saved to memory when needed.
* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
Instead of surprise crashes and memory corruption, we now hang threads that attempt to re-enter the Python interpreter after Python runtime finalization has started. These are typically daemon threads (our long standing mis-feature) but could also be threads spawned by extension modules that then try to call into Python. This marks the `PyThread_exit_thread` public C API as deprecated as there is no plausible safe way to accomplish that on any supported platform in the face of things like C++ code with finalizers anywhere on a thread's stack. Doing this was the least bad option.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.
Rename some of the files and functions in preparation for this change.
Instead of be limited just by the size of addressable memory (2**63
bytes), Python integers are now also limited by the number of bits, so
the number of bit now always fit in a 64-bit integer.
Both limits are much larger than what might be available in practice,
so it doesn't affect users.
_PyLong_NumBits() and _PyLong_Frexp() are now always successful.
Fix a bug that can cause a crash when sub-interpreters use "basic"
single-phase extension modules. Shared objects could refer to PyGC_Head
nodes that had been freed as part of interpreter shutdown.
Use a `_PyStackRef` and defer the reference to `f_funcobj` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
* Detect source file encoding.
* Use the "replace" error handler even for UTF-8 (default) encoding.
* Remove the BOM.
* Fix detection of too long lines if they contain NUL.
* Return the head rather than the tail for truncated long lines.
If the generator is already cleared, then most fields in the
generator's frame are not valid other than f_executable. The invalid
fields may contain dangling pointers and should not be used.
Use a `_PyStackRef` and defer the reference to `f_executable` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
Setting dev_mode to 1 in an isolated configuration now enables also
faulthandler.
Moreover, setting "module_search_paths" option with
PyInitConfig_SetStrList() now sets "module_search_paths_set" to 1.
Add PyConfig_Get(), PyConfig_GetInt(), PyConfig_Set() and
PyConfig_Names() functions to get and set the current runtime Python
configuration.
Add visibility and "sys spec" to config and preconfig specifications.
_PyConfig_AsDict() now converts PyConfig.xoptions as a dictionary.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Switch more _Py_IsImmortal(...) assertions to _Py_IsImmortalLoose(...)
The remaining calls to _Py_IsImmortal are in free-threaded-only code,
initialization of core objects, tests, and guards that fall back to
code that works with mortal objects.
The `zip_next` function uses a common optimization technique for methods
that generate tuples. The iterator maintains an internal reference to
the returned tuple. When the method is called again, it checks if the
internal tuple's reference count is 1. If so, the tuple can be reused.
However, this approach is not safe under the free-threading build:
after checking the reference count, another thread may perform the same
check and also reuse the tuple. This can result in a double decref on
the items of the replaced tuple and a double incref (memory leak) on
the items of the tuple being set.
This adds a function, `_PyObject_IsUniquelyReferenced` that
encapsulates the stricter logic necessary for the free-threaded build:
the internal tuple must be owned by the current thread, have a local
refcount of one, and a shared refcount of zero.
`Py_DECREF` and `PyStackRef_CLOSE` are now implemented as macros in the
free-threaded build in ceval.c. There are two motivations;
* MSVC has problems inlining functions in ceval.c in the PGO build.
* We will want to mark escaping calls in order to spill the stack
pointer in ceval.c and we will want to do this around `_Py_Dealloc`
(or `_Py_MergeZeroLocalRefcount` or `_Py_DecRefShared`), not around
the entire `Py_DECREF` or `PyStackRef_CLOSE` call.
The free-threaded GC now visits interpreter stacks to keep objects
that use deferred reference counting alive.
Interpreter frames are zero initialized in the free-threaded GC so
that the GC doesn't see garbage data. This is a temporary measure
until stack spilling around escaping calls is implemented.
Co-authored-by: Ken Jin <kenjin@python.org>
As of 529a160 (gh-118204), building with HAVE_DYNAMIC_LOADING stopped working. This is a minimal fix just to get builds working again. There are actually a number of long-standing deficiencies with HAVE_DYNAMIC_LOADING builds that need to be resolved separately.
This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.
Co-authored-by: Ken Jin <kenjin@python.org>
`BUILD_SET` should use a borrow instead of a steal. The cleanup in `_DO_CALL`
`CONVERSION_FAILED` was incorrect.
Co-authored-by: Ken Jin <kenjin@python.org>
We were not properly accounting for interpreter memory leaks at
shutdown and had two sources of leaks:
* Objects that use deferred reference counting and were reachable via
static types outlive the final GC. We now disable deferred reference
counting on all objects if we are calling the GC due to interpreter
shutdown.
* `_PyMem_FreeDelayed` did not properly check for interpreter shutdown
so we had some memory blocks that were enqueued to be freed, but
never actually freed.
* `_PyType_FinalizeIdPool` wasn't called at interpreter shutdown.
Fix _PyArg_UnpackKeywordsWithVararg for the case when argument for
positional-or-keyword parameter is passed by keyword.
There was only one such case in the stdlib -- the TypeVar constructor.
This automatically spills the results from `_PyStackRef_FromPyObjectNew`
to the in-memory stack so that the deferred references are visible to
the GC before we make any possibly escaping call.
Co-authored-by: Ken Jin <kenjin@python.org>
Fix PyEval_GetLocals() to avoid SystemError ("bad argument to
internal function"). Don't redefine the 'ret' variable in the if
block.
Add an unit test on PyEval_GetLocals().
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.
Co-authored-by: Ken Jin <kenjin@python.org>
Add ENTER_RECURSIVE and LEAVE_RECURSIVE macros in ast.c, ast_opt.c and
symtable.c. Remove VISIT_QUIT macro in symtable.c.
The current recursion depth counter only needs to be updated during
normal execution -- all functions should just return an error code
if an error occurs.
* gh-122188: Move magic number to its own file
* Add versionadded directive
* Do work in C
* Integrate launcher.c
* Make _pyc_magic_number private
* Remove metadata
* Move sys.implementation -> _imp
* Modernize comment
* Move _RAW_MAGIC_NUMBER to the C side as well
* _pyc_magic_number -> pyc_magic_number
* Remove unused import
* Update docs
* Apply suggestions from code review
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
* Fix typo in tests
---------
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
The adaptive counter doesn't do anything currently in the free-threaded
build and TSan reports a data race due to concurrent modifications to
the counter.
* Use compensated summation for complex sums with floating-point items.
This amends #121176.
* sum() specializations for floats and complexes now use
PyLong_AsDouble() instead of PyLong_AsLongAndOverflow() and
compensated summation as well.
In the free-threaded build, we need to lock pending->mutex when clearing
the handling_thread in order not to race with a concurrent
make_pending_calls in the same interpreter.
This combines and updates our freelist handling to use a consistent
implementation. Objects in the freelist are linked together using the
first word of memory block.
If configured with freelists disabled, these operations are essentially
no-ops.
* Reject uop definitions that declare values as 'unused' that are already cached by prior uops
* Track which variables are defined and only load from memory when needed
* Support explicit `flush` in macro definitions.
* Make sure stack is flushed in where needed.
We should maintain the invariant that a zero `ob_tid` implies the
refcount fields are merged.
* Move the assignment in `_Py_MergeZeroLocalRefcount` to immediately
before the refcount merge.
* Update `_PyTrash_thread_destroy_chain` to set `ob_ref_shared` to
`_Py_REF_MERGED` when setting `ob_tid` to zero.
Also check this invariant with assertions in the GC in debug builds.
That uncovered a bug when running out of memory during GC.
* The result has type Py_ssize_t, not intptr_t.
* Type cast between unsigned and signdet integer types should be explicit.
* Downcasting should be explicit.
* Fix integer overflow check in sum().
The change in gh-118157 (b2cd54a) should have also updated clear_singlephase_extension() but didn't. We fix that here. Note that clear_singlephase_extension() (AKA _PyImport_ClearExtension()) is only used in tests.
The `_PySeqLock_EndRead` function needs an acquire fence to ensure that
the load of the sequence happens after any loads within the read side
critical section. The missing fence can trigger bugs on macOS arm64.
Additionally, we need a release fence in `_PySeqLock_LockWrite` to
ensure that the sequence update is visible before any modifications to
the cache entry.
The tracemalloc_tracebacks hash table has traceback keys and NULL
values, but its destructors do not reflect this -- key_destroy_func is
NULL while value_destroy_func is raw_free. Swap these to free the
traceback keys instead.
Fix warnings when using -Wimplicit-fallthrough compiler flag.
Annotate explicitly "fall through" switch cases with a new
_Py_FALLTHROUGH macro which uses __attribute__((fallthrough)) if
available. Replace "fall through" comments with _Py_FALLTHROUGH.
Add _Py__has_attribute() macro. No longer define __has_attribute()
macro if it's not defined. Move also _Py__has_builtin() at the top
of pyport.h.
Co-Authored-By: Nikita Sobolev <mail@sobolevn.me>
This change makes things a little less painful for some users. It also fixes a failing assert (gh-120765), by making sure all subinterpreters are destroyed before the main interpreter. As part of that, we make sure Py_Finalize() always runs with the main interpreter active.
This PR sets up tagged pointers for CPython.
The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.
Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.
This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.
The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.
Please read Include/internal/pycore_stackref.h for more information!
---------
Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
This makes the following macros public as part of the non-limited C-API for
locking a single object or two objects at once.
* `Py_BEGIN_CRITICAL_SECTION(op)` / `Py_END_CRITICAL_SECTION()`
* `Py_BEGIN_CRITICAL_SECTION2(a, b)` / `Py_END_CRITICAL_SECTION2()`
The supporting functions and structs used by the macros are also exposed for
cases where C macros are not available.
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
In gh-120009 I used an atexit hook to finalize the _datetime module's static types at interpreter shutdown. However, atexit hooks are executed very early in finalization, which is a problem in the few cases where a subclass of one of those static types is still alive until the final GC collection. The static builtin types don't have this probably because they are finalized toward the end, after the final GC collection. To avoid the problem for _datetime, I have applied a similar approach here.
Also, credit goes to @mgorny and @neonene for the new tests.
FYI, I would have liked to take a slightly cleaner approach with managed static types, but wanted to get a smaller fix in first for the sake of backporting. I'll circle back to the cleaner approach with a future change on the main branch.
This adds a `_PyRecursiveMutex` type based on `PyMutex` and uses that
for the import lock. This fixes some data races in the free-threaded
build and generally simplifies the import lock code.
The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.
Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).
The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
We make use of the same mechanism that we use for the static builtin types. This required a few tweaks.
The relevant code could use some cleanup but I opted to avoid the significant churn in this change. I'll tackle that separately.
This change is the final piece needed to make _datetime support multiple interpreters. I've updated the module slot accordingly.
The free-threaded build currently immortalizes objects that use deferred
reference counting (see gh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.
This fixes a race condition involving the tracking of whether the
behavior is suppressed.
Only call `gc_restore_tid()` from stop-the-world contexts.
`worklist_pop()` can be called while other threads are running, so use a
relaxed atomic to modify `ob_tid`.
* Add docs for new APIs
* Add soft-deprecation notices
* Add What's New porting entries
* Update comments referencing `PyFrame_LocalsToFast()` to mention the proxy instead
* Other related cleanups found when looking for refs to the deprecated APIs
Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS
The implementation basically copies LOAD_GLOBAL. Possibly it could be deduplicated,
but that seems like it may get hairy since the two operations have different operands.
This is important to fix in 3.14 for PEP 649, but it's a bug in earlier versions too,
and we should backport to 3.13 and 3.12 if possible.
Release the GIL before calling `_Py_qsbr_unregister`.
The deadlock could occur when the GIL was enabled at runtime. The
`_Py_qsbr_unregister` call might block while holding the GIL because the
thread state was not active, but the GIL was still held.
Make sure that `gilstate_counter` is not zero in when calling
`PyThreadState_Clear()`. A destructor called from `PyThreadState_Clear()` may
call back into `PyGILState_Ensure()` and `PyGILState_Release()`. If
`gilstate_counter` is zero, it will try to create a new thread state before
the current active thread state is destroyed, leading to an assertion failure
or crash.
* gh-119118: Fix performance regression in tokenize module
- Cache line object to avoid creating a Unicode object
for all of the tokens in the same line.
- Speed up byte offset to column offset conversion by using the
smallest buffer possible to measure the difference.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
The fix in gh-119561 introduced an assertion that doesn't hold true if any of the three new test extension modules are loaded more than once. This is fine normally but breaks if the new test_check_state_first() is run more than once, which happens for refleak checking and with the regrtest --forever flag. We fix that here by clearing each of the three modules after loading them. We also tweak a check in _modules_by_index_check().
The assertion was added in gh-118532 but was based on the invalid assumption that PyState_FindModule() would only be called with an already-initialized module def. I've added a test to make sure we don't make that assumption again.
`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).
Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.
This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
_PyArg_Parser holds static global data generated for modules by Argument Clinic. The _PyArg_Parser.kwtuple field is a tuple object, even though it's stored within a static global. In some cases the tuple is statically allocated and thus it's okay that it gets shared by multiple interpreters. However, in other cases the tuple is set lazily, allocated from the heap using the active interprepreter at the point the tuple is needed.
This is a problem once that interpreter is destroyed since _PyArg_Parser.kwtuple becomes at dangling pointer, leading to crashes. It isn't a problem if the tuple is allocated under the main interpreter, since its lifetime is bound to the lifetime of the runtime. The solution here is to temporarily switch to the main interpreter. The alternative would be to always statically allocate the tuple.
This change also fixes a bug where only the most recent parser was added to the global linked list.
The PEP 649 implementation will require a way to load NotImplementedError
from the bytecode. @markshannon suggested implementing this by converting
LOAD_ASSERTION_ERROR into a more general mechanism for loading constants.
This PR adds this new opcode. I will work on the rest of the implementation
of the PEP separately.
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* BaseException_vectorcall() now creates a tuple from 'args' array.
* Creation an exception using BaseException_vectorcall() is now a
single function call, rather than having to call
BaseException_new() and then BaseException_init().
Calling BaseException_init() is inefficient since it overrides
the 'args' attribute.
* _PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the
KeyError instance to use BaseException_vectorcall().
`_Py_qsbr_unregister` is called when the PyThreadState is already
detached, so the access to `tstate->qsbr` isn't safe without locking the
shared mutex. Grab the `struct _qsbr_shared` from the interpreter
instead.
This change makes sure all extension/builtin modules have their init function run first by the main interpreter before proceeding with import in the original interpreter (main or otherwise). This means when the import of a single-phase init module fails in an isolated subinterpreter, it won't tie any global state/callbacks to the subinterpreter.
Add the ability to enable/disable the GIL at runtime, and use that in
the C module loading code.
We can't know before running a module init function if it supports
free-threading, so the GIL is temporarily enabled before doing so. If
the module declares support for running without the GIL, the GIL is
later disabled. Otherwise, the GIL is permanently enabled, and will
never be disabled again for the life of the current interpreter.
We already intern and immortalize most string constants. In the
free-threaded build, other constants can be a source of reference count
contention because they are shared by all threads running the same code
objects.
Now, such classes will no longer require changes in Python 3.13 in the normal case.
The test suite for robotframework passes with no DeprecationWarnings under this PR.
I also added a new DeprecationWarning for the case where `_field_types` exists
but is incomplete, since that seems likely to indicate a user mistake.
Use the new public Raw functions:
* _PyTime_PerfCounterUnchecked() with PyTime_PerfCounterRaw()
* _PyTime_TimeUnchecked() with PyTime_TimeRaw()
* _PyTime_MonotonicUnchecked() with PyTime_MonotonicRaw()
Remove internal functions:
* _PyTime_PerfCounterUnchecked()
* _PyTime_TimeUnchecked()
* _PyTime_MonotonicUnchecked()
We have only been tracking each module's PyModuleDef. However, there are some problems with that. For example, in some cases we load single-phase init extension modules from def->m_base.m_init or def->m_base.m_copy, but if multiple modules share a def then we can end up with unexpected behavior.
With this change, we track the following:
* PyModuleDef (same as before)
* for some modules, its init function or a copy of its __dict__, but specific to that module
* whether it is a builtin/core module or a "dynamic" extension
* the interpreter (ID) that owns the cached __dict__ (only if cached)
This also makes it easier to remember the module's kind (e.g. single-phase init) and if loading it previously failed, which I'm doing separately.
* Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations.
* Remove CALL_PY_WITH_DEFAULTS specialization
* Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
Use _PyDeadline_Init() and _PyDeadline_Get() in
EnterNonRecursiveMutex() of thread_nt.h.
_PyDeadline_Get() uses the monotonic clock which is now the same as
the perf counter clock on all platforms. So this change does not
cause any behavior change. It just reuses existing helper functions.