Moves the Emscripten web example into a standalone folder, and updates
Makefile targets to build the web example. Instructions for usage have
also been added.
* Mark almost all reachable objects before doing collection phase
* Add stats for objects marked
* Visit new frames before each increment
* Update docs
* Clearer calculation of work to do.
---------
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
- Add `git describe` output to headers generated by `make_ssl_data.py`
This info is more important than the date when the file was generated.
It does mean that the tool now requires a Git checkout of OpenSSL,
not for example a release tarball.
- Regenerate the older file to add the info.
To the other older file, add a note about manual edits.
- Add notes on how to add a new OpenSSL version
- Add 3.4 error messages and multissl tests
This gets rid of the immortal check in `PyStackRef_FromPyObjectSteal()`.
Overall, this improves performance about 2% in the free threading
build.
This also renames `PyStackRef_Is()` to `PyStackRef_IsExactly()` because
the macro requires that the tag bits of the arguments match, which is
only true in certain special cases.
Enable specialization of LOAD_GLOBAL in free-threaded builds.
Thread-safety of specialization in free-threaded builds is provided by the following:
A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization.
Generation of new keys versions is made atomic in free-threaded builds.
Existing helpers are used to atomically modify the opcode.
Thread-safety of specialized instructions in free-threaded builds is provided by the following:
Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards.
Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset.
The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.
* Mark almost all reachable objects before doing collection phase
* Add stats for objects marked
* Visit new frames before each increment
* Remove lazy dict tracking
* Update docs
* Clearer calculation of work to do.
This unifies the code for nodejs and the code for the browser. After this
commit, the browser example doesn't work; this will be fixed in a
subsequent update.
Fixes a bug where pygettext would attempt
to extract a message from a code like this:
def _(x): pass
This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.
However, since 'x' is not a string literal,
it would erroneously issue a warning.
Add emscripten.py script to automate emscripten build.
This is modeled heavily on `Tools/wasm/wasi.py`. This will form the basis of an Emscripten build bot.
Eventually wasm32-wasi will represent WASI 1.0, and so it's currently deprecated so it can be used for that eventual purpose. wasm32-wasip1 is also more specific to what version of WASI is currently supported.
---------
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Move creation of a tuple for var-positional parameter out of
_PyArg_UnpackKeywordsWithVararg().
Merge _PyArg_UnpackKeywordsWithVararg() with _PyArg_UnpackKeywords().
Add a new parameter in _PyArg_UnpackKeywords().
The "parameters" and "converters" attributes of ParseArgsCodeGen no
longer contain the var-positional parameter. It is now available as the
"varpos" attribute. Optimize code generation for var-positional
parameter and reuse the same generating code for functions with and without
keyword parameters.
Add special converters for var-positional parameter. "tuple" represents it as
a Python tuple and "array" represents it as a continuous array of PyObject*.
"object" is a temporary alias of "tuple".
The primary objective here is to allow some later changes to be cleaner. Mostly this involves renaming things and moving a few things around.
* CrossInterpreterData -> XIData
* crossinterpdatafunc -> xidatafunc
* split out pycore_crossinterp_data_registry.h
* add _PyXIData_lookup_t
Fix the gdb pretty printer in the face of --enable-shared by delaying the attempt to load the _PyInterpreterFrame definition until after .so files are loaded.
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.
Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.
Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
* Remove references to `Modules/_blake2`.
* Remove `Modules/_blake2` entry from CODEOWNERS
The folder does not exist anymore.
* Remove `Modules/_blake2` entry from `Tools/c-analyzer/TODO`
The cases generator inserts code to save and restore the stack pointer around
statements that contain escaping calls. To find the beginning of such statements,
we would walk backwards from the escaping call until we encountered a token that
was treated as a statement terminator. This set of terminators should include
preprocessor directives.
Avoid temporary tuple creation when all arguments either positional-only
or vararg.
Objects/setobject.c and Modules/gcmodule.c adapted. This fixes slight
performance regression for set methods, introduced by gh-115112.
These consist of a number of short snippets that help identify scaling
bottlenecks in the free threaded interpreter.
The current bottlenecks are in calling functions in benchmarks that call
functions (due to `LOAD_ATTR` not yet using deferred reference counting)
and when accessing thread-local data.
Users want to know when the current context switches to a different
context object. Right now this happens when and only when a context
is entered or exited, so the enter and exit events are synonymous with
"switched". However, if the changes proposed for gh-99633 are
implemented, the current context will also switch for reasons other
than context enter or exit. Since users actually care about context
switches and not enter or exit, replace the enter and exit events with
a single switched event.
The former exit event was emitted just before exiting the context.
The new switched event is emitted after the context is exited to match
the semantics users expect of an event with a past-tense name. If
users need the ability to clean up before the switch takes effect,
another event type can be added in the future. It is not added here
because YAGNI.
I skipped 0 in the enum as a matter of practice. Skipping 0 makes it
easier to troubleshoot when code forgets to set zeroed memory, and it
aligns with best practices for other tools (e.g.,
https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum).
Co-authored-by: Richard Hansen <rhansen@rhansen.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
Users want to know when the current context switches to a different
context object. Right now this happens when and only when a context
is entered or exited, so the enter and exit events are synonymous with
"switched". However, if the changes proposed for gh-99633 are
implemented, the current context will also switch for reasons other
than context enter or exit. Since users actually care about context
switches and not enter or exit, replace the enter and exit events with
a single switched event.
The former exit event was emitted just before exiting the context.
The new switched event is emitted after the context is exited to match
the semantics users expect of an event with a past-tense name. If
users need the ability to clean up before the switch takes effect,
another event type can be added in the future. It is not added here
because YAGNI.
I skipped 0 in the enum as a matter of practice. Skipping 0 makes it
easier to troubleshoot when code forgets to set zeroed memory, and it
aligns with best practices for other tools (e.g.,
https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum).
* Spill the evaluation around escaping calls in the generated interpreter and JIT.
* The code generator tracks live, cached values so they can be saved to memory when needed.
* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
Cache in C PEG-generator reworked:
we save artificial rules in cache by Node string representation as a key instead of Node object itself.
As a result total count of artificial rules in parsers.c is lowered from 283 to 170.
More natural number ordering is used for the names of artificial rules.
Auxiliary method CCallMakerVisitor._generate_artificial_rule_call is added.
Its purpose is abstracting work with artificial rules cache.
Explicit using of "is_repeat1" kwarg is added to visit_Repeat0 and visit_Repeat1 methods.
Its slightly improve code readabitily.
Use a `_PyStackRef` and defer the reference to `f_funcobj` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
* Tools/build/stable_abi.py: Improve ergonomics
- Make the manifest file argument optional
- Output resolved paths with --list (getting rid of `../../`)
- Mention --all or --generate-all if no actions are specified
* Don't hardcode Misc/stable_abi.toml in Makefile, rely on the default
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Use a `_PyStackRef` and defer the reference to `f_executable` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
Add PyConfig_Get(), PyConfig_GetInt(), PyConfig_Set() and
PyConfig_Names() functions to get and set the current runtime Python
configuration.
Add visibility and "sys spec" to config and preconfig specifications.
_PyConfig_AsDict() now converts PyConfig.xoptions as a dictionary.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
This replaces the existing hashlib Blake2 module with a single implementation that uses HACL\*'s Blake2b/Blake2s implementations. We added support for all the modes exposed by the Python API, including tree hashing, leaf nodes, and so on. We ported and merged all of these changes upstream in HACL\*, added test vectors based on Python's existing implementation, and exposed everything needed for hashlib.
This was joint work done with @R1kM.
See the PR for much discussion and benchmarking details. TL;DR: On many systems, 8-50% faster (!) than `libb2`, on some systems it appeared 10-20% slower than `libb2`.
As of 529a160 (gh-118204), building with HAVE_DYNAMIC_LOADING stopped working. This is a minimal fix just to get builds working again. There are actually a number of long-standing deficiencies with HAVE_DYNAMIC_LOADING builds that need to be resolved separately.
This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.
Co-authored-by: Ken Jin <kenjin@python.org>
* Parameters after the var-positional parameter are now keyword-only
instead of positional-or-keyword.
* Correctly calculate min_kw_only.
* Raise errors for invalid combinations of the var-positional parameter
with "*", "/" and deprecation markers.
This automatically spills the results from `_PyStackRef_FromPyObjectNew`
to the in-memory stack so that the deferred references are visible to
the GC before we make any possibly escaping call.
Co-authored-by: Ken Jin <kenjin@python.org>
The adaptive counter doesn't do anything currently in the free-threaded
build and TSan reports a data race due to concurrent modifications to
the counter.
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
In the free-threaded build, we need to lock pending->mutex when clearing
the handling_thread in order not to race with a concurrent
make_pending_calls in the same interpreter.
* Reject uop definitions that declare values as 'unused' that are already cached by prior uops
* Track which variables are defined and only load from memory when needed
* Support explicit `flush` in macro definitions.
* Make sure stack is flushed in where needed.
The `used` field must be written using atomic stores because `set_len`
and iterators may access the field concurrently without holding the
per-object lock.
Refactor the fast Unicode hash check into `_PyObject_HashFast` and use relaxed
atomic loads in the free-threaded build.
After this change, the TSAN doesn't report data races for this method.
This PR sets up tagged pointers for CPython.
The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.
Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.
This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.
The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.
Please read Include/internal/pycore_stackref.h for more information!
---------
Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
The _strptime module object was cached in a static local variable (in the datetime.strptime() implementation). That's a problem when it crosses isolation boundaries, such as reinitializing the runtme or between interpreters. This change fixes the problem by dropping the static variable, instead always relying on the normal sys.modules cache (via PyImport_Import()).
This adds a `_PyRecursiveMutex` type based on `PyMutex` and uses that
for the import lock. This fixes some data races in the free-threaded
build and generally simplifies the import lock code.
The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.
Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).
The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
We make use of the same mechanism that we use for the static builtin types. This required a few tweaks.
The relevant code could use some cleanup but I opted to avoid the significant churn in this change. I'll tackle that separately.
This change is the final piece needed to make _datetime support multiple interpreters. I've updated the module slot accordingly.
The free-threaded build currently immortalizes objects that use deferred
reference counting (see gh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.
This fixes a race condition involving the tracking of whether the
behavior is suppressed.