This approach eliminates the originally reported race. It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.
* Mark almost all reachable objects before doing collection phase
* Add stats for objects marked
* Visit new frames before each increment
* Remove lazy dict tracking
* Update docs
* Clearer calculation of work to do.
* Document that slices can be marshalled
* Deduplicate and organize the list of supported types
in docs
* Organize the type code list in marshal.c, to make
it more obvious that this is a versioned format
* Back-fill some historical info
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
The PyMutex implementation supports unlocking after fork because we
clear the list of waiters in parking_lot.c. This doesn't work as well
for _PyRecursiveMutex because on some systems, such as SerenityOS, the
thread id is not preserved across fork().
These changes makes it easier to backport the _interpreters, _interpqueues, and _interpchannels modules to Python 3.12.
This involves the following:
* add the _PyXI_GET_STATE() and _PyXI_GET_GLOBAL_STATE() macros
* add _PyXIData_lookup_context_t and _PyXIData_GetLookupContext()
* add _Py_xi_state_init() and _Py_xi_state_fini()
These changes makes it easier to backport the _interpreters, _interpqueues, and _interpchannels modules to Python 3.12.
This involves the following:
* rename several structs and typedefs
* add several typedefs
* stop using the PyThreadState.state field directly in parking_lot.c
Move creation of a tuple for var-positional parameter out of
_PyArg_UnpackKeywordsWithVararg().
Merge _PyArg_UnpackKeywordsWithVararg() with _PyArg_UnpackKeywords().
Add a new parameter in _PyArg_UnpackKeywords().
The "parameters" and "converters" attributes of ParseArgsCodeGen no
longer contain the var-positional parameter. It is now available as the
"varpos" attribute. Optimize code generation for var-positional
parameter and reuse the same generating code for functions with and without
keyword parameters.
Add special converters for var-positional parameter. "tuple" represents it as
a Python tuple and "array" represents it as a continuous array of PyObject*.
"object" is a temporary alias of "tuple".
The primary objective here is to allow some later changes to be cleaner. Mostly this involves renaming things and moving a few things around.
* CrossInterpreterData -> XIData
* crossinterpdatafunc -> xidatafunc
* split out pycore_crossinterp_data_registry.h
* add _PyXIData_lookup_t
Introduce helpers for (un)specializing instructions
Consolidate the code to specialize/unspecialize instructions into
two helper functions and use them in `_Py_Specialize_BinaryOp`.
The resulting code is more concise and keeps all of the logic at
the point where we decide to specialize/unspecialize an instruction.
- The specialization logic determines the appropriate specialization using only the operand's type, which is safe to read non-atomically (changing it requires stopping the world). We are guaranteed that the type will not change in between when it is checked and when we specialize the bytecode because the types involved are immutable (you cannot assign to `__class__` for exact instances of `dict`, `set`, or `frozenset`). The bytecode is mutated atomically using helpers.
- The specialized instructions rely on the operand type not changing in between the `DEOPT_IF` checks and the calls to the appropriate type-specific helpers (e.g. `_PySet_Contains`). This is a correctness requirement in the default builds and there are no changes to the opcodes in the free-threaded builds that would invalidate this.
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.
Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.
Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
* Fix comprehensions comment to inlined by pep 709
* Update spacing
Co-authored-by: RUANG (James Roy) <longjinyii@outlook.com>
* Add reference to PEP 709
---------
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
Co-authored-by: RUANG (James Roy) <longjinyii@outlook.com>
Temporarily ignore warnings about JIT deactivation when perf support is active.
This will be reverted as soon as a way is found to determine at run time whether the interpreter was built with JIT. Currently, this is not possible on Windows.
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Previously, if the `ast.AST._fields` attribute was deleted, attempts to create a new `as`t node would crash due to the assumption that `_fields` always had a non-NULL value. Now it has been fixed by adding an extra check to ensure that `_fields` does not have a NULL value (this can happen when you manually remove `_fields` attribute).
* Remove `@suppress_immortalization` decorator
* Make suppression flag per-thread instead of per-interpreter
* Suppress immortalization in `eval()` to avoid refleaks in three tests
(test_datetime.test_roundtrip, test_logging.test_config8_ok, and
test_random.test_after_fork).
* frozenset() is constant, but not a singleton. When run multiple times,
the test could fail due to constant interning.
This replaces `_PyEval_BuiltinsFromGlobals` with
`_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference
instead of a borrowed reference. Internally, the new function uses
per-thread reference counting when possible to avoid contention on the
refcount fields on the builtins module.
On Windows, `long` is a signed 32-bit integer so it can't represent
`0xffff_ffff` without overflow. Windows exit codes are unsigned 32-bit
integers, so if a child process exits with `-1`, it will be represented
as `0xffff_ffff`.
Also fix a number of other possible cases where `_Py_HandleSystemExit`
could return with an exception set, leading to a `SystemError` (or
fatal error in debug builds) later on during shutdown.
This fixes a crash when `gc.get_objects()` or `gc.get_referrers()` is
called during a GC in the free threading build.
Switch to `_PyObjectStack` to avoid corrupting the `struct worklist`
linked list maintained by the GC. Also, don't return objects that are frozen
(`gc.freeze()`) or in the process of being collected to more closely match
the behavior of the default build.
They used to be shared, before 3.12. Returning to sharing them resolves a failure on Py_TRACE_REFS builds.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
* Fix usage of PyStackRef_FromPyObjectSteal in CALL_TUPLE_1
This was missed in gh-124894
* Fix usage of PyStackRef_FromPyObjectSteal in _CALL_STR_1
This was missed in gh-124894
* Regenerate code
This is essentially a cleanup, moving a handful of API declarations to the header files where they fit best, creating new ones when needed.
We do the following:
* add pycore_debug_offsets.h and move _Py_DebugOffsets, etc. there
* inline struct _getargs_runtime_state and struct _gilstate_runtime_state in _PyRuntimeState
* move struct _reftracer_runtime_state to the existing pycore_object_state.h
* add pycore_audit.h and move to it _Py_AuditHookEntry , _PySys_Audit(), and _PySys_ClearAuditHooks
* add audit.h and cpython/audit.h and move the existing audit-related API there
*move the perfmap/trampoline API from cpython/sysmodule.h to cpython/ceval.h, and remove the now-empty cpython/sysmodule.h
Users want to know when the current context switches to a different
context object. Right now this happens when and only when a context
is entered or exited, so the enter and exit events are synonymous with
"switched". However, if the changes proposed for gh-99633 are
implemented, the current context will also switch for reasons other
than context enter or exit. Since users actually care about context
switches and not enter or exit, replace the enter and exit events with
a single switched event.
The former exit event was emitted just before exiting the context.
The new switched event is emitted after the context is exited to match
the semantics users expect of an event with a past-tense name. If
users need the ability to clean up before the switch takes effect,
another event type can be added in the future. It is not added here
because YAGNI.
I skipped 0 in the enum as a matter of practice. Skipping 0 makes it
easier to troubleshoot when code forgets to set zeroed memory, and it
aligns with best practices for other tools (e.g.,
https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum).
Co-authored-by: Richard Hansen <rhansen@rhansen.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
This fixes a crash when running the PyO3 test suite on the free-threaded
build. The `qsbr` field is initialized after the `PyThreadState` is
added to the interpreter's linked list -- it might still be NULL.
Instead, we "steal" the queue of to-be-freed memory blocks. This is
always initialized (possibly empty) and protected by the stop the world
pause.
Users want to know when the current context switches to a different
context object. Right now this happens when and only when a context
is entered or exited, so the enter and exit events are synonymous with
"switched". However, if the changes proposed for gh-99633 are
implemented, the current context will also switch for reasons other
than context enter or exit. Since users actually care about context
switches and not enter or exit, replace the enter and exit events with
a single switched event.
The former exit event was emitted just before exiting the context.
The new switched event is emitted after the context is exited to match
the semantics users expect of an event with a past-tense name. If
users need the ability to clean up before the switch takes effect,
another event type can be added in the future. It is not added here
because YAGNI.
I skipped 0 in the enum as a matter of practice. Skipping 0 makes it
easier to troubleshoot when code forgets to set zeroed memory, and it
aligns with best practices for other tools (e.g.,
https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum).
When formatting the AST as a string, infinite values are replaced by
1e309, which evaluates to infinity. The initialization of this string
replacement was not thread-safe in the free threading build.
Each of the `LOAD_GLOBAL` specializations is implemented roughly as:
1. Load keys version.
2. Load cached keys version.
3. Deopt if (1) and (2) don't match.
4. Load keys.
5. Load cached index into keys.
6. Load object from (4) at offset from (5).
This is not thread-safe in free-threaded builds; the keys object may be replaced
in between steps (3) and (4).
This change refactors the specializations to avoid reloading the keys object and
instead pass the keys object from guards to be consumed by downstream uops.
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
* Make slices marshallable
* Emit slices as constants
* Update Python/marshal.c
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
* Refactor codegen_slice into two functions so it
always has the same net effect
* Fix for free-threaded builds
* Simplify marshal loading of slices
* Only return SUCCESS/ERROR from codegen_slice
---------
Co-authored-by: Mark Shannon <mark@hotpy.org>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Stop the world when invalidating function versions
The tier1 interpreter specializes `CALL` instructions based on the values
of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1
interpreter uses function versions to verify that the attributes of a function
during execution of a specialization match those seen during specialization.
A function's version is initialized in `MAKE_FUNCTION` and is invalidated when
any of the critical function attributes are changed. The tier1 interpreter stores
the function version in the inline cache during specialization. A guard is used by
the specialized instruction to verify that the version of the function on the operand
stack matches the cached version (and therefore has all of the expected attributes).
It is assumed that once the guard passes, all attributes will remain unchanged
while executing the rest of the specialized instruction.
Stopping the world when invalidating function versions ensures that all critical
function attributes will remain unchanged after the function version guard passes
in free-threaded builds. It's important to note that this is only true if the remainder
of the specialized instruction does not enter and exit a stop-the-world point.
We will stop the world the first time any of the following function attributes
are mutated:
- defaults
- vectorcall
- kwdefaults
- closure
- code
This should happen rarely and only happens once per function, so the performance
impact on majority of code should be minimal.
Additionally, refactor the API for manipulating function versions to more clearly
match the stated semantics.
* Spill the evaluation around escaping calls in the generated interpreter and JIT.
* The code generator tracks live, cached values so they can be saved to memory when needed.
* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
Instead of surprise crashes and memory corruption, we now hang threads that attempt to re-enter the Python interpreter after Python runtime finalization has started. These are typically daemon threads (our long standing mis-feature) but could also be threads spawned by extension modules that then try to call into Python. This marks the `PyThread_exit_thread` public C API as deprecated as there is no plausible safe way to accomplish that on any supported platform in the face of things like C++ code with finalizers anywhere on a thread's stack. Doing this was the least bad option.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.
Rename some of the files and functions in preparation for this change.
Instead of be limited just by the size of addressable memory (2**63
bytes), Python integers are now also limited by the number of bits, so
the number of bit now always fit in a 64-bit integer.
Both limits are much larger than what might be available in practice,
so it doesn't affect users.
_PyLong_NumBits() and _PyLong_Frexp() are now always successful.
Fix a bug that can cause a crash when sub-interpreters use "basic"
single-phase extension modules. Shared objects could refer to PyGC_Head
nodes that had been freed as part of interpreter shutdown.
Use a `_PyStackRef` and defer the reference to `f_funcobj` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
* Detect source file encoding.
* Use the "replace" error handler even for UTF-8 (default) encoding.
* Remove the BOM.
* Fix detection of too long lines if they contain NUL.
* Return the head rather than the tail for truncated long lines.
If the generator is already cleared, then most fields in the
generator's frame are not valid other than f_executable. The invalid
fields may contain dangling pointers and should not be used.
Use a `_PyStackRef` and defer the reference to `f_executable` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
Setting dev_mode to 1 in an isolated configuration now enables also
faulthandler.
Moreover, setting "module_search_paths" option with
PyInitConfig_SetStrList() now sets "module_search_paths_set" to 1.
Add PyConfig_Get(), PyConfig_GetInt(), PyConfig_Set() and
PyConfig_Names() functions to get and set the current runtime Python
configuration.
Add visibility and "sys spec" to config and preconfig specifications.
_PyConfig_AsDict() now converts PyConfig.xoptions as a dictionary.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Switch more _Py_IsImmortal(...) assertions to _Py_IsImmortalLoose(...)
The remaining calls to _Py_IsImmortal are in free-threaded-only code,
initialization of core objects, tests, and guards that fall back to
code that works with mortal objects.
The `zip_next` function uses a common optimization technique for methods
that generate tuples. The iterator maintains an internal reference to
the returned tuple. When the method is called again, it checks if the
internal tuple's reference count is 1. If so, the tuple can be reused.
However, this approach is not safe under the free-threading build:
after checking the reference count, another thread may perform the same
check and also reuse the tuple. This can result in a double decref on
the items of the replaced tuple and a double incref (memory leak) on
the items of the tuple being set.
This adds a function, `_PyObject_IsUniquelyReferenced` that
encapsulates the stricter logic necessary for the free-threaded build:
the internal tuple must be owned by the current thread, have a local
refcount of one, and a shared refcount of zero.
`Py_DECREF` and `PyStackRef_CLOSE` are now implemented as macros in the
free-threaded build in ceval.c. There are two motivations;
* MSVC has problems inlining functions in ceval.c in the PGO build.
* We will want to mark escaping calls in order to spill the stack
pointer in ceval.c and we will want to do this around `_Py_Dealloc`
(or `_Py_MergeZeroLocalRefcount` or `_Py_DecRefShared`), not around
the entire `Py_DECREF` or `PyStackRef_CLOSE` call.
The free-threaded GC now visits interpreter stacks to keep objects
that use deferred reference counting alive.
Interpreter frames are zero initialized in the free-threaded GC so
that the GC doesn't see garbage data. This is a temporary measure
until stack spilling around escaping calls is implemented.
Co-authored-by: Ken Jin <kenjin@python.org>
As of 529a160 (gh-118204), building with HAVE_DYNAMIC_LOADING stopped working. This is a minimal fix just to get builds working again. There are actually a number of long-standing deficiencies with HAVE_DYNAMIC_LOADING builds that need to be resolved separately.
This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.
Co-authored-by: Ken Jin <kenjin@python.org>
`BUILD_SET` should use a borrow instead of a steal. The cleanup in `_DO_CALL`
`CONVERSION_FAILED` was incorrect.
Co-authored-by: Ken Jin <kenjin@python.org>
We were not properly accounting for interpreter memory leaks at
shutdown and had two sources of leaks:
* Objects that use deferred reference counting and were reachable via
static types outlive the final GC. We now disable deferred reference
counting on all objects if we are calling the GC due to interpreter
shutdown.
* `_PyMem_FreeDelayed` did not properly check for interpreter shutdown
so we had some memory blocks that were enqueued to be freed, but
never actually freed.
* `_PyType_FinalizeIdPool` wasn't called at interpreter shutdown.
Fix _PyArg_UnpackKeywordsWithVararg for the case when argument for
positional-or-keyword parameter is passed by keyword.
There was only one such case in the stdlib -- the TypeVar constructor.
This automatically spills the results from `_PyStackRef_FromPyObjectNew`
to the in-memory stack so that the deferred references are visible to
the GC before we make any possibly escaping call.
Co-authored-by: Ken Jin <kenjin@python.org>
Fix PyEval_GetLocals() to avoid SystemError ("bad argument to
internal function"). Don't redefine the 'ret' variable in the if
block.
Add an unit test on PyEval_GetLocals().
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.
Co-authored-by: Ken Jin <kenjin@python.org>
Add ENTER_RECURSIVE and LEAVE_RECURSIVE macros in ast.c, ast_opt.c and
symtable.c. Remove VISIT_QUIT macro in symtable.c.
The current recursion depth counter only needs to be updated during
normal execution -- all functions should just return an error code
if an error occurs.
* gh-122188: Move magic number to its own file
* Add versionadded directive
* Do work in C
* Integrate launcher.c
* Make _pyc_magic_number private
* Remove metadata
* Move sys.implementation -> _imp
* Modernize comment
* Move _RAW_MAGIC_NUMBER to the C side as well
* _pyc_magic_number -> pyc_magic_number
* Remove unused import
* Update docs
* Apply suggestions from code review
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
* Fix typo in tests
---------
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
The adaptive counter doesn't do anything currently in the free-threaded
build and TSan reports a data race due to concurrent modifications to
the counter.
* Use compensated summation for complex sums with floating-point items.
This amends #121176.
* sum() specializations for floats and complexes now use
PyLong_AsDouble() instead of PyLong_AsLongAndOverflow() and
compensated summation as well.
In the free-threaded build, we need to lock pending->mutex when clearing
the handling_thread in order not to race with a concurrent
make_pending_calls in the same interpreter.
This combines and updates our freelist handling to use a consistent
implementation. Objects in the freelist are linked together using the
first word of memory block.
If configured with freelists disabled, these operations are essentially
no-ops.
* Reject uop definitions that declare values as 'unused' that are already cached by prior uops
* Track which variables are defined and only load from memory when needed
* Support explicit `flush` in macro definitions.
* Make sure stack is flushed in where needed.
We should maintain the invariant that a zero `ob_tid` implies the
refcount fields are merged.
* Move the assignment in `_Py_MergeZeroLocalRefcount` to immediately
before the refcount merge.
* Update `_PyTrash_thread_destroy_chain` to set `ob_ref_shared` to
`_Py_REF_MERGED` when setting `ob_tid` to zero.
Also check this invariant with assertions in the GC in debug builds.
That uncovered a bug when running out of memory during GC.
* The result has type Py_ssize_t, not intptr_t.
* Type cast between unsigned and signdet integer types should be explicit.
* Downcasting should be explicit.
* Fix integer overflow check in sum().
The change in gh-118157 (b2cd54a) should have also updated clear_singlephase_extension() but didn't. We fix that here. Note that clear_singlephase_extension() (AKA _PyImport_ClearExtension()) is only used in tests.
The `_PySeqLock_EndRead` function needs an acquire fence to ensure that
the load of the sequence happens after any loads within the read side
critical section. The missing fence can trigger bugs on macOS arm64.
Additionally, we need a release fence in `_PySeqLock_LockWrite` to
ensure that the sequence update is visible before any modifications to
the cache entry.
The tracemalloc_tracebacks hash table has traceback keys and NULL
values, but its destructors do not reflect this -- key_destroy_func is
NULL while value_destroy_func is raw_free. Swap these to free the
traceback keys instead.
Fix warnings when using -Wimplicit-fallthrough compiler flag.
Annotate explicitly "fall through" switch cases with a new
_Py_FALLTHROUGH macro which uses __attribute__((fallthrough)) if
available. Replace "fall through" comments with _Py_FALLTHROUGH.
Add _Py__has_attribute() macro. No longer define __has_attribute()
macro if it's not defined. Move also _Py__has_builtin() at the top
of pyport.h.
Co-Authored-By: Nikita Sobolev <mail@sobolevn.me>
This change makes things a little less painful for some users. It also fixes a failing assert (gh-120765), by making sure all subinterpreters are destroyed before the main interpreter. As part of that, we make sure Py_Finalize() always runs with the main interpreter active.
This PR sets up tagged pointers for CPython.
The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.
Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.
This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.
The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.
Please read Include/internal/pycore_stackref.h for more information!
---------
Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
This makes the following macros public as part of the non-limited C-API for
locking a single object or two objects at once.
* `Py_BEGIN_CRITICAL_SECTION(op)` / `Py_END_CRITICAL_SECTION()`
* `Py_BEGIN_CRITICAL_SECTION2(a, b)` / `Py_END_CRITICAL_SECTION2()`
The supporting functions and structs used by the macros are also exposed for
cases where C macros are not available.
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
In gh-120009 I used an atexit hook to finalize the _datetime module's static types at interpreter shutdown. However, atexit hooks are executed very early in finalization, which is a problem in the few cases where a subclass of one of those static types is still alive until the final GC collection. The static builtin types don't have this probably because they are finalized toward the end, after the final GC collection. To avoid the problem for _datetime, I have applied a similar approach here.
Also, credit goes to @mgorny and @neonene for the new tests.
FYI, I would have liked to take a slightly cleaner approach with managed static types, but wanted to get a smaller fix in first for the sake of backporting. I'll circle back to the cleaner approach with a future change on the main branch.
This adds a `_PyRecursiveMutex` type based on `PyMutex` and uses that
for the import lock. This fixes some data races in the free-threaded
build and generally simplifies the import lock code.
The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.
Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).
The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
We make use of the same mechanism that we use for the static builtin types. This required a few tweaks.
The relevant code could use some cleanup but I opted to avoid the significant churn in this change. I'll tackle that separately.
This change is the final piece needed to make _datetime support multiple interpreters. I've updated the module slot accordingly.
The free-threaded build currently immortalizes objects that use deferred
reference counting (see gh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.
This fixes a race condition involving the tracking of whether the
behavior is suppressed.
Only call `gc_restore_tid()` from stop-the-world contexts.
`worklist_pop()` can be called while other threads are running, so use a
relaxed atomic to modify `ob_tid`.
* Add docs for new APIs
* Add soft-deprecation notices
* Add What's New porting entries
* Update comments referencing `PyFrame_LocalsToFast()` to mention the proxy instead
* Other related cleanups found when looking for refs to the deprecated APIs
Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS
The implementation basically copies LOAD_GLOBAL. Possibly it could be deduplicated,
but that seems like it may get hairy since the two operations have different operands.
This is important to fix in 3.14 for PEP 649, but it's a bug in earlier versions too,
and we should backport to 3.13 and 3.12 if possible.
Release the GIL before calling `_Py_qsbr_unregister`.
The deadlock could occur when the GIL was enabled at runtime. The
`_Py_qsbr_unregister` call might block while holding the GIL because the
thread state was not active, but the GIL was still held.
Make sure that `gilstate_counter` is not zero in when calling
`PyThreadState_Clear()`. A destructor called from `PyThreadState_Clear()` may
call back into `PyGILState_Ensure()` and `PyGILState_Release()`. If
`gilstate_counter` is zero, it will try to create a new thread state before
the current active thread state is destroyed, leading to an assertion failure
or crash.
* gh-119118: Fix performance regression in tokenize module
- Cache line object to avoid creating a Unicode object
for all of the tokens in the same line.
- Speed up byte offset to column offset conversion by using the
smallest buffer possible to measure the difference.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
The fix in gh-119561 introduced an assertion that doesn't hold true if any of the three new test extension modules are loaded more than once. This is fine normally but breaks if the new test_check_state_first() is run more than once, which happens for refleak checking and with the regrtest --forever flag. We fix that here by clearing each of the three modules after loading them. We also tweak a check in _modules_by_index_check().
The assertion was added in gh-118532 but was based on the invalid assumption that PyState_FindModule() would only be called with an already-initialized module def. I've added a test to make sure we don't make that assumption again.
`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).
Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.
This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
_PyArg_Parser holds static global data generated for modules by Argument Clinic. The _PyArg_Parser.kwtuple field is a tuple object, even though it's stored within a static global. In some cases the tuple is statically allocated and thus it's okay that it gets shared by multiple interpreters. However, in other cases the tuple is set lazily, allocated from the heap using the active interprepreter at the point the tuple is needed.
This is a problem once that interpreter is destroyed since _PyArg_Parser.kwtuple becomes at dangling pointer, leading to crashes. It isn't a problem if the tuple is allocated under the main interpreter, since its lifetime is bound to the lifetime of the runtime. The solution here is to temporarily switch to the main interpreter. The alternative would be to always statically allocate the tuple.
This change also fixes a bug where only the most recent parser was added to the global linked list.
The PEP 649 implementation will require a way to load NotImplementedError
from the bytecode. @markshannon suggested implementing this by converting
LOAD_ASSERTION_ERROR into a more general mechanism for loading constants.
This PR adds this new opcode. I will work on the rest of the implementation
of the PEP separately.
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* BaseException_vectorcall() now creates a tuple from 'args' array.
* Creation an exception using BaseException_vectorcall() is now a
single function call, rather than having to call
BaseException_new() and then BaseException_init().
Calling BaseException_init() is inefficient since it overrides
the 'args' attribute.
* _PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the
KeyError instance to use BaseException_vectorcall().
`_Py_qsbr_unregister` is called when the PyThreadState is already
detached, so the access to `tstate->qsbr` isn't safe without locking the
shared mutex. Grab the `struct _qsbr_shared` from the interpreter
instead.
This change makes sure all extension/builtin modules have their init function run first by the main interpreter before proceeding with import in the original interpreter (main or otherwise). This means when the import of a single-phase init module fails in an isolated subinterpreter, it won't tie any global state/callbacks to the subinterpreter.
Add the ability to enable/disable the GIL at runtime, and use that in
the C module loading code.
We can't know before running a module init function if it supports
free-threading, so the GIL is temporarily enabled before doing so. If
the module declares support for running without the GIL, the GIL is
later disabled. Otherwise, the GIL is permanently enabled, and will
never be disabled again for the life of the current interpreter.
We already intern and immortalize most string constants. In the
free-threaded build, other constants can be a source of reference count
contention because they are shared by all threads running the same code
objects.
Now, such classes will no longer require changes in Python 3.13 in the normal case.
The test suite for robotframework passes with no DeprecationWarnings under this PR.
I also added a new DeprecationWarning for the case where `_field_types` exists
but is incomplete, since that seems likely to indicate a user mistake.
Use the new public Raw functions:
* _PyTime_PerfCounterUnchecked() with PyTime_PerfCounterRaw()
* _PyTime_TimeUnchecked() with PyTime_TimeRaw()
* _PyTime_MonotonicUnchecked() with PyTime_MonotonicRaw()
Remove internal functions:
* _PyTime_PerfCounterUnchecked()
* _PyTime_TimeUnchecked()
* _PyTime_MonotonicUnchecked()
We have only been tracking each module's PyModuleDef. However, there are some problems with that. For example, in some cases we load single-phase init extension modules from def->m_base.m_init or def->m_base.m_copy, but if multiple modules share a def then we can end up with unexpected behavior.
With this change, we track the following:
* PyModuleDef (same as before)
* for some modules, its init function or a copy of its __dict__, but specific to that module
* whether it is a builtin/core module or a "dynamic" extension
* the interpreter (ID) that owns the cached __dict__ (only if cached)
This also makes it easier to remember the module's kind (e.g. single-phase init) and if loading it previously failed, which I'm doing separately.
* Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations.
* Remove CALL_PY_WITH_DEFAULTS specialization
* Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
Use _PyDeadline_Init() and _PyDeadline_Get() in
EnterNonRecursiveMutex() of thread_nt.h.
_PyDeadline_Get() uses the monotonic clock which is now the same as
the perf counter clock on all platforms. So this change does not
cause any behavior change. It just reuses existing helper functions.
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.
PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.
A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
The function returns `True` or `False` depending on whether the GIL is
currently enabled. In the default build, it always returns `True`
because the GIL is always enabled.
Most module names are interned and immortalized, but the main
module was not. This partially addresses a scaling bottleneck in the
free-threaded when creating closure concurrently in the main module.
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:
- Move codecs-related state that lives on `PyInterpreterState` to a
struct declared in `pycore_codecs.h`.
- In free-threaded builds, add a mutex to `codecs_state` to synchronize
operations on `search_path`. Because `search_path_mutex` is used as a
normal mutex and not a critical section, we must be extremely careful
with operations called while holding it.
- The codec registry is explicitly initialized as part of
`_PyUnicode_InitEncodings` to simplify thread-safety.
* Target _FOR_ITER_TIER_TWO at POP_TOP following the matching END_FOR
* Modify _GUARD_NOT_EXHAUSTED_RANGE, _GUARD_NOT_EXHAUSTED_LIST and _GUARD_NOT_EXHAUSTED_TUPLE so that they also target the POP_TOP following the matching END_FOR
Add "Raw" variant of PyTime functions:
* PyTime_MonotonicRaw()
* PyTime_PerfCounterRaw()
* PyTime_TimeRaw()
Changes:
* Add documentation and tests. Tests release the GIL while calling
raw clock functions.
* py_get_system_clock() and py_get_monotonic_clock() now check that
the GIL is hold by the caller if raise_exc is non-zero.
* Reimplement "Unchecked" functions with raw clock functions.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Account for `add_stopiteration_handler` pushing a block for `async with`.
To allow generator functions that previously almost hit the `CO_MAXBLOCKS`
limit by nesting non-async blocks, the limit is increased by 1.
This increase allows one more block in non-generator functions.
The code for Tier 2 is now only compiled when configured
with `--enable-experimental-jit[=yes|interpreter]`.
We drop support for `PYTHON_UOPS` and -`Xuops`,
but you can disable the interpreter or JIT
at runtime by setting `PYTHON_JIT=0`.
You can also build it without enabling it by default
using `--enable-experimental-jit=yes-off`;
enable with `PYTHON_JIT=1`.
On Windows, the `build.bat` script supports
`--experimental-jit`, `--experimental-jit-off`,
`--experimental-interpreter`.
In the C code, `_Py_JIT` is defined as before
when the JIT is enabled; the new variable
`_Py_TIER2` is defined when the JIT *or* the
interpreter is enabled. It is actually a bitmask:
1: JIT; 2: default-off; 4: interpreter.
Avoid detaching thread state when stopping the world. When re-attaching
the thread state, the thread would attempt to resume the top-most
critical section, which might now be held by a thread paused for our
stop-the-world request.
Deferred reference counting is not fully implemented yet. As a temporary
measure, we immortalize objects that would use deferred reference
counting to avoid multi-threaded scaling bottlenecks.
This is only performed in the free-threaded build once the first
non-main thread is started. Additionally, some tests, including refleak
tests, suppress this behavior.
Basically, I've turned most of _PyImport_LoadDynamicModuleWithSpec() into two new functions (_PyImport_GetModInitFunc() and _PyImport_RunModInitFunc()) and moved the rest of it out into _imp_create_dynamic_impl(). There shouldn't be any changes in behavior.
This change makes some future changes simpler. This is particularly relevant to potentially calling each module init function in the main interpreter first. Thus the critical part of the PR is the addition of _PyImport_RunModInitFunc(), which is strictly focused on running the init func and validating the result. A later PR will take it a step farther by capturing error information rather than raising exceptions.
FWIW, this change also helps readers by clarifying a bit more about what happens when an extension/builtin module is imported.