This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.
Co-authored-by: Ken Jin <kenjin@python.org>
`BUILD_SET` should use a borrow instead of a steal. The cleanup in `_DO_CALL`
`CONVERSION_FAILED` was incorrect.
Co-authored-by: Ken Jin <kenjin@python.org>
This automatically spills the results from `_PyStackRef_FromPyObjectNew`
to the in-memory stack so that the deferred references are visible to
the GC before we make any possibly escaping call.
Co-authored-by: Ken Jin <kenjin@python.org>
* Reject uop definitions that declare values as 'unused' that are already cached by prior uops
* Track which variables are defined and only load from memory when needed
* Support explicit `flush` in macro definitions.
* Make sure stack is flushed in where needed.
This PR sets up tagged pointers for CPython.
The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.
Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.
This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.
The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.
Please read Include/internal/pycore_stackref.h for more information!
---------
Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
* Add docs for new APIs
* Add soft-deprecation notices
* Add What's New porting entries
* Update comments referencing `PyFrame_LocalsToFast()` to mention the proxy instead
* Other related cleanups found when looking for refs to the deprecated APIs
Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS
The implementation basically copies LOAD_GLOBAL. Possibly it could be deduplicated,
but that seems like it may get hairy since the two operations have different operands.
This is important to fix in 3.14 for PEP 649, but it's a bug in earlier versions too,
and we should backport to 3.13 and 3.12 if possible.
The PEP 649 implementation will require a way to load NotImplementedError
from the bytecode. @markshannon suggested implementing this by converting
LOAD_ASSERTION_ERROR into a more general mechanism for loading constants.
This PR adds this new opcode. I will work on the rest of the implementation
of the PEP separately.
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations.
* Remove CALL_PY_WITH_DEFAULTS specialization
* Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
* Target _FOR_ITER_TIER_TWO at POP_TOP following the matching END_FOR
* Modify _GUARD_NOT_EXHAUSTED_RANGE, _GUARD_NOT_EXHAUSTED_LIST and _GUARD_NOT_EXHAUSTED_TUPLE so that they also target the POP_TOP following the matching END_FOR
Makes sys.settrace, sys.setprofile, and monitoring generally thread-safe.
Mostly uses a stop-the-world approach and synchronization around the code object's _co_instrumentation_version. There may be a little bit of extra synchronization around the monitoring data that's required to be TSAN clean.
Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``),
shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The
API used for adaptive specialization counters is changed but the behavior is
(supposed to be) identical.
The behavior of the Tier 2 counters is changed:
- There are no longer dynamic thresholds (we never varied these).
- All counters now use the same exponential backoff.
- The counter for ``JUMP_BACKWARD`` starts counting down from 16.
- The ``temperature`` in side exits starts counting down from 64.
This merges all `_CHECK_STACK_SPACE` uops in a trace into a single `_CHECK_STACK_SPACE_OPERAND` uop that checks whether there is enough stack space for all calls included in the entire trace.
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.
The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
This makes the Tier 2 interpreter a little faster.
I calculated by about 3%,
though I hesitate to claim an exact number.
This starts by doubling the trace size limit (to 512),
making it more likely that loops fit in a trace.
The rest of the approach is to only load
`oparg` and `operand` in cases that use them.
The code generator know when these are used.
For `oparg`, it will conditionally emit
```
oparg = CURRENT_OPARG();
```
at the top of the case block.
(The `oparg` variable may be referenced multiple times
by the instructions code block, so it must be in a variable.)
For `operand`, it will use `CURRENT_OPERAND()` directly
instead of referencing the `operand` variable,
which no longer exists.
(There is only one place where this will be used.)
This uses the new mechanism whereby certain uops
are replaced by others during translation,
using the `_PyUop_Replacements` table.
We further special-case the `_FOR_ITER_TIER_TWO` uop
to update the deoptimization target to point
just past the corresponding `END_FOR` opcode.
Two tiny code cleanups are also part of this PR.
- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
In Python/bytecodes.c, you now write
```
DEOPT_IF(condition);
```
The code generator expands this to
```
DEOPT_IF(condition, opcode);
```
where `opcode` is the name of the unspecialized instruction.
This works inside macro expansions too.
**CAVEAT:** The entire `DEOPT_IF(condition)` statement must be on a single line.
If it isn't, the substitution will fail; an error will be printed by the code generator
and the C compiler will report some errors.
These are the most popular specializations of `LOAD_ATTR` and `STORE_ATTR`
that weren't already viable uops:
* Split LOAD_ATTR_METHOD_WITH_VALUES
* Split LOAD_ATTR_METHOD_NO_DICT
* Split LOAD_ATTR_SLOT
* Split STORE_ATTR_SLOT
* Split STORE_ATTR_INSTANCE_VALUE
Also:
* Add `-v` flag to code generator which prints a list of non-viable uops
(easter-egg: it can print execution counts -- see source)
* Double _Py_UOP_MAX_TRACE_LENGTH to 128
I had dropped one of the DEOPT_IF() calls! :-(
* Rename SAVE_IP to _SET_IP
* Rename EXIT_TRACE to _EXIT_TRACE
* Rename SAVE_CURRENT_IP to _SAVE_CURRENT_IP
* Rename INSERT to _INSERT (This is for Ken Jin's abstract interpreter)
* Rename IS_NONE to _IS_NONE
* Rename JUMP_TO_TOP to _JUMP_TO_TOP
Instead of using `GO_TO_INSTRUCTION(CALL_PY_EXACT_ARGS)` we just add the macro elements of the latter to the macro for the former. This requires lengthening the uops array in struct opcode_macro_expansion. (It also required changes to stacking.py that were merged already.)
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
Introducing a new file, stacking.py, that takes over several responsibilities related to symbolic evaluation of push/pop operations, with more generality.