Commit Graph

283 Commits

Author SHA1 Message Date
Victor Stinner 5c44d7d99c
gh-130931: Add pycore_interpframe.h internal header (#131249)
Move _PyInterpreterFrame and associated functions
to a new pycore_interpframe.h header.
2025-03-19 18:17:44 +01:00
Ken Jin b2ed7a6d6a
gh-131281: Add include for pystats builds (#131369)
Add include to for pystats builds
2025-03-18 00:36:06 +08:00
Mark Shannon a45f25361d
GH-131238: More refactoring of core header files (GH-131351)
Adds new pycore_stats.h header file to help break dependencies involving the pycore_code.h header.
2025-03-17 14:41:05 +00:00
T. Wouters de2f7da77d
gh-115999: Add free-threaded specialization for FOR_ITER (#128798)
Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
2025-03-12 16:21:46 +01:00
Irit Katriel a1417b211f
gh-100239: replace BINARY_SUBSCR & family by BINARY_OP with oparg NB_SUBSCR (#129700) 2025-02-07 22:39:54 +00:00
Brandt Bucher 5fa7e1b7fd
GH-129715: Remove _DYNAMIC_EXIT (GH-129716) 2025-02-07 11:41:17 -08:00
Diego Russo 567394517a
GH-128842: Collect JIT memory stats (GH-128941) 2025-02-02 15:17:53 -08:00
Yan Yanchii e6c76b947b
GH-128872: Remove unused argument from _PyCode_Quicken (GH-128873)
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
2025-02-02 15:09:30 -08:00
Irit Katriel 4815131910
gh-100239: specialize bitwise logical binary ops on ints (#128927) 2025-01-29 09:28:21 +00:00
Sam Gross a10f99375e
Revert "GH-128914: Remove conditional stack effects from `bytecodes.c` and the code generators (GH-128918)" (GH-129202)
The commit introduced a ~2.5-3% regression in the free threading build.

This reverts commit ab61d3f430.
2025-01-23 09:26:25 +00:00
Mark Shannon ab61d3f430
GH-128914: Remove conditional stack effects from `bytecodes.c` and the code generators (GH-128918) 2025-01-20 17:09:23 +00:00
Kirill Podoprigora 6c52ada551
gh-100239: Handle NaN and zero division in guards for `BINARY_OP_EXTEND` (#128963)
Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
2025-01-19 11:02:49 +00:00
Irit Katriel 3893a92d95
gh-100239: specialize long tail of binary operations (#128722) 2025-01-16 15:22:13 +00:00
mpage b5ee0258bf
gh-115999: Specialize `LOAD_ATTR` for instance and class receivers in free-threaded builds (#128164)
Finish specialization for LOAD_ATTR in the free-threaded build by adding support for class and instance receivers.
2025-01-14 11:56:11 -08:00
Mark Shannon ddd959987c
GH-128685: Specialize (rather than quicken) LOAD_CONST into LOAD_CONST_[IM]MORTAL (GH-128708) 2025-01-13 10:30:28 +00:00
T. Wouters 8f93dd8a8f
gh-115999: Add free-threaded specialization for COMPARE_OP (#126410)
Add free-threaded specialization for COMPARE_OP, and tests for COMPARE_OP specialization in general.

Co-authored-by: Donghee Na <donghee.na92@gmail.com>
2025-01-07 06:41:01 -08:00
Ken Jin 7ef4907412
gh-128262: Allow specialization of calls to classes with __slots__ (GH-128263) 2024-12-31 12:24:17 +08:00
Yan Yanchii fe4dd07a84
gh-119786: Mention `InternalDocs/interpreter.md` instead of non-existing `adaptive.md` (#128329)
`Python/specialize.c`: Mention `InternalDocs/interpreter.md` instead of non-existing `adaptive.md`


Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
2024-12-30 18:38:09 +00:00
Neil Schemenauer 1b15c89a17
gh-115999: Specialize `STORE_ATTR` in free-threaded builds. (gh-127838)
* Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and
  use in place of `_PyDictKeys_StringLookup`.

* Change `_PyObject_TryGetInstanceAttribute` to use that function
  in the case of split keys.

* Add `unicodekeys_lookup_split` helper which allows code sharing
  between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`.

* Fix locking for `STORE_ATTR_INSTANCE_VALUE`.  Create
  `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and
  `tp_version_tag` cannot change.

* Pass `tp_version_tag` to `specialize_dict_access()`, ensuring
  the version we store on the cache is the correct one (in case of
  it changing during the specalize analysis).

* Split `analyze_descriptor` into `analyze_descriptor_load` and
  `analyze_descriptor_store` since those don't share much logic.
  Add `descriptor_is_class` helper function.

* In `specialize_dict_access`, double check `_PyObject_GetManagedDict()`
  in case we race and dict was materialized before the lock.

* Avoid borrowed references in `_Py_Specialize_StoreAttr()`.

* Use `specialize()` and `unspecialize()` helpers.

* Add unit tests to ensure specializing happens as expected in FT builds.

* Add unit tests to attempt to trigger data races (useful for running under TSAN).

* Add `has_split_table` function to `_testinternalcapi`.
2024-12-19 10:21:17 -08:00
Donghee Na 48c70b8f7d
gh-115999: Enable BINARY_SUBSCR_GETITEM for free-threaded build (gh-127737) 2024-12-19 11:08:17 +09:00
mpage 2de048ce79
gh-115999: Specialize loading attributes from modules in free-threaded builds (#127711)
We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds:

_CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version.

_LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.
2024-12-13 10:17:16 -08:00
mpage c84928ed6d
gh-115999: Specialize `CALL_KW` in free-threaded builds (#127713)
* Enable specialization of CALL_KW

* Fix bug pushing frame in _PY_FRAME_KW

`_PY_FRAME_KW` pushes a pointer to the new frame onto the stack for
consumption by the next uop. When pushing the frame fails, we do not
want to push the result, `NULL`, to the stack because it is not
a valid stackref. This works in the default build because `PyStackRef_NULL`
 and `NULL` are the same value, so the `PyStackRef_XCLOSE()` in the error
handler ignores it. In the free-threaded build the values are not the same;
`PyStackRef_XCLOSE()` will attempt to decref a null pointer.
2024-12-11 15:18:22 -08:00
Sam Gross a353455fca
gh-125610: Fix `STORE_ATTR_INSTANCE_VALUE` specialization check (GH-125612)
The `STORE_ATTR_INSTANCE_VALUE` opcode doesn't support objects with
non-NULL managed dictionaries, so don't specialize to that op in that case.
2024-12-06 10:48:24 -05:00
mpage dabcecfd6d
gh-115999: Enable specialization of `CALL` instructions in free-threaded builds (#127123)
The CALL family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below.

A few changes were needed to make CALL_ALLOC_AND_ENTER_INIT thread-safe:

Added _PyType_LookupRefAndVersion, which returns the type version corresponding to the returned ref.

Added _PyType_CacheInitForSpecialization, which takes an init method and the corresponding type version and only populates the specialization cache if the current type version matches the supplied version. This prevents potentially caching a stale value in free-threaded builds if we race with an update to __init__.

Only cache __init__ functions that are deferred in free-threaded builds. This ensures that the reference to __init__ that is stored in the specialization cache is valid if the type version guard in _CHECK_AND_ALLOCATE_OBJECT passes.
Fix a bug in _CREATE_INIT_FRAME where the frame is pushed to the stack on failure.

A few other miscellaneous changes were also needed:

Use {LOCK,UNLOCK}_OBJECT in LIST_APPEND. This ensures that the list's per-object lock is held while we are appending to it.

Add missing co_tlbc for _Py_InitCleanup.

Stop/start the world around setting the eval frame hook. This allows us to read interp->eval_frame non-atomically and preserves the behavior of _CHECK_PEP_523 documented below.
2024-12-03 11:20:20 -08:00
Neil Schemenauer 276cd66ccb
gh-115999: Add free-threaded specialization for `SEND` (gh-127426)
No additional thread safety changes are required.  Note that sending to
a generator that is shared between threads is currently not safe in the
free-threaded build.
2024-12-03 10:25:12 -08:00
Neil Schemenauer 0cb5222079
gh-115999: Specialize `LOAD_SUPER_ATTR` in free-threaded builds (gh-127128)
Use existing helpers to atomically modify the bytecode.  Add unit tests
to ensure specializing is happening as expected.  Add test_specialize.py
that can be used with ThreadSanitizer to detect data races.  
Fix thread safety issue with cell_set_contents().
2024-12-03 09:32:26 -08:00
Michael Droettboom edefb8678a
gh-127518: Fix pystats build after #127169 (#127526)
gh-127518: Fix pystats build after #127619
2024-12-02 20:17:08 +00:00
Mark Shannon a8dd821d5b
GH-126491: GC: Mark objects reachable from roots before doing cycle collection (GH-127110)
* Mark almost all reachable objects before doing collection phase

* Add stats for objects marked

* Visit new frames before each increment

* Update docs

* Clearer calculation of work to do.
2024-12-02 10:12:17 +00:00
Donghee Na e2713409cf
gh-115999: Add partial free-thread specialization for BINARY_SUBSCR (gh-127227) 2024-12-02 10:38:17 +09:00
Sam Gross 71ede1142d
gh-115999: Add free-threaded specialization for `STORE_SUBSCR` (#127169)
The specialization only depends on the type, so no special thread-safety
considerations there.

STORE_SUBSCR_LIST_INT needs to lock the list before modifying it.

`_PyDict_SetItem_Take2` already internally locks the dictionary using a
critical section.
2024-11-26 16:46:06 -05:00
mpage d24a22e9b6
gh-115999: Record success in `specialize` (#127167)
Record success in `specialize`

This matches the existing behavior where we increment the success
stat for the generic opcode each time we successfully specialize
an instruction.
2024-11-22 12:07:05 -08:00
Kirill Podoprigora 27486c3365
gh-115999: Add free-threaded specialization for `UNPACK_SEQUENCE` (#126600)
Add free-threaded specialization for `UNPACK_SEQUENCE` opcode.
`UNPACK_SEQUENCE_TUPLE/UNPACK_SEQUENCE_TWO_TUPLE` are already thread safe since tuples are immutable.
`UNPACK_SEQUENCE_LIST` is not thread safe because of nature of lists (there is nothing preventing another thread from adding items to or removing them the list while the instruction is executing). To achieve thread safety we add a critical section to the implementation of `UNPACK_SEQUENCE_LIST`, especially around the parts where we check the size of the list and push items onto the stack.


---------

Co-authored-by: Matt Page <mpage@meta.com>
Co-authored-by: mpage <mpage@cs.stanford.edu>
2024-11-22 19:00:35 +02:00
Donghee Na 78a530a578
gh-115999: Add free-threaded specialization for ``TO_BOOL`` (gh-126616) 2024-11-22 07:52:16 +09:00
mpage 09c240f20c
gh-115999: Specialize `LOAD_GLOBAL` in free-threaded builds (#126607)
Enable specialization of LOAD_GLOBAL in free-threaded builds.

Thread-safety of specialization in free-threaded builds is provided by the following:

A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization.
Generation of new keys versions is made atomic in free-threaded builds.
Existing helpers are used to atomically modify the opcode.
Thread-safety of specialized instructions in free-threaded builds is provided by the following:

Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards.
Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset.
The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.
2024-11-21 11:22:21 -08:00
mpage 32428cf9ea
gh-115999: Don't take a reason in unspecialize (#127030)
Don't take a reason in unspecialize

We only want to compute the reason if stats are enabled. Optimizing
compilers should optimize this away for us (gcc and clang do), but
it's better to be safe than sorry.
2024-11-20 14:54:48 -08:00
Hugo van Kemenade 899fdb213d
Revert "GH-126491: GC: Mark objects reachable from roots before doing cycle collection (GH-126502)" (#126983) 2024-11-19 11:25:09 +02:00
Mark Shannon b0fcc2c47a
GH-126491: GC: Mark objects reachable from roots before doing cycle collection (GH-126502)
* Mark almost all reachable objects before doing collection phase

* Add stats for objects marked

* Visit new frames before each increment

* Remove lazy dict tracking

* Update docs

* Clearer calculation of work to do.
2024-11-18 14:31:26 +00:00
Sergey B Kirpichev d9e251223e
gh-103951: enable optimization for fast attribute access on module subclasses (GH-126264)
Co-authored-by: Nicolas Tessore <n.tessore@ucl.ac.uk>
2024-11-15 16:03:38 +08:00
Kirill Podoprigora 6e03ff2419
gh-126513: Use helpers for `_Py_Specialize_ConstainsOp` (#126517)
* Use helpers for _Py_Specialize_ConstainsOp

* Remove unnecessary variable
2024-11-06 13:52:15 -08:00
mpage 9ce4fa0719
gh-115999: Introduce helpers for (un)specializing instructions (#126414)
Introduce helpers for (un)specializing instructions

Consolidate the code to specialize/unspecialize instructions into
two helper functions and use them in `_Py_Specialize_BinaryOp`.
The resulting code is more concise and keeps all of the logic at
the point where we decide to specialize/unspecialize an instruction.
2024-11-06 12:04:04 -08:00
Donghee Na 4ea214ea98
gh-115999: Add free-threaded specialization for CONTAINS_OP (gh-126450)
- The specialization logic determines the appropriate specialization using only the operand's type, which is safe to read non-atomically (changing it requires stopping the world). We are guaranteed that the type will not change in between when it is checked and when we specialize the bytecode because the types involved are immutable (you cannot assign to `__class__` for exact instances of `dict`, `set`, or `frozenset`). The bytecode is mutated atomically using helpers.
- The specialized instructions rely on the operand type not changing in between the `DEOPT_IF` checks and the calls to the appropriate type-specific helpers (e.g. `_PySet_Contains`). This is a correctness requirement in the default builds and there are no changes to the opcodes in the free-threaded builds that would invalidate this.
2024-11-06 03:35:10 +00:00
mpage 2e95c5ba3b
gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP` (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
2024-11-04 11:13:32 -08:00
Mark Shannon faa3272fb8
GH-125837: Split `LOAD_CONST` into three. (GH-125972)
* Add LOAD_CONST_IMMORTAL opcode

* Add LOAD_SMALL_INT opcode

* Remove RETURN_CONST opcode
2024-10-29 11:15:42 +00:00
mpage e99f159be4
gh-115999: Stop the world when invalidating function versions (#124997)
Stop the world when invalidating function versions

The tier1 interpreter specializes `CALL` instructions based on the values
of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1
interpreter uses function versions to verify that the attributes of a function
during execution of a specialization match those seen during specialization.
A function's version is initialized in `MAKE_FUNCTION` and is invalidated when
any of the critical function attributes are changed. The tier1 interpreter stores
the function version in the inline cache during specialization. A guard is used by
the specialized instruction to verify that the version of the function on the operand
stack matches the cached version (and therefore has all of the expected attributes).
It is assumed that once the guard passes, all attributes will remain unchanged
while executing the rest of the specialized instruction.

Stopping the world when invalidating function versions ensures that all critical
function attributes will remain unchanged after the function version guard passes
in free-threaded builds. It's important to note that this is only true if the remainder
of the specialized instruction does not enter and exit a stop-the-world point.

We will stop the world the first time any of the following function attributes
are mutated:

- defaults
- vectorcall
- kwdefaults
- closure
- code

This should happen rarely and only happens once per function, so the performance
impact on majority of code should be minimal.

Additionally, refactor the API for manipulating function versions to more clearly
match the stated semantics.
2024-10-08 10:04:35 -04:00
Mark Shannon c87b0e4a46
GH-124284: Add stats for refcount operations on immortal objects (GH-124288) 2024-09-23 19:10:55 +01:00
Mark Shannon 0b0f7befad
GH-123232: Fix "not specialized" stats (GH-123236) 2024-08-23 10:46:03 +01:00
Mark Shannon 5d3201fe3f
GH-123040: Specialize shadowed `LOAD_ATTR`. (GH-123219) 2024-08-23 10:22:35 +01:00
Mark Shannon a3d8c0542e
GH-123197: Only count an instruction as deferred if it hasn't deopted first. (GH-123222)
Only count an instruction as deferred if hasn't deopted first.
2024-08-22 14:17:10 +01:00
Brandt Bucher 427b106162
GH-118093: Specialize calls to non-vectorcall classes as `CALL_NON_PY_GENERAL` (GH-123212)
Specialize classes without vectorcall as CALL_NON_PY_GENERAL
2024-08-22 11:50:55 +01:00
Mark Shannon a4fd7aa4a6
GH-115776: Allow any fixed sized object to have inline values (GH-123192) 2024-08-21 15:52:04 +01:00