Commit Graph

545 Commits

Author SHA1 Message Date
Pablo Galindo Salgado 42b25ad4d3
gh-91048: Refactor and optimize remote debugging module (#134652)
Completely refactor Modules/_remote_debugging_module.c with improved
code organization, replacing scattered reference counting and error
handling with centralized goto error paths. This cleanup improves
maintainability and reduces code duplication throughout the module while
preserving the same external API.

Implement memory page caching optimization in Python/remote_debug.h to
avoid repeated reads of the same memory regions during debugging
operations. The cache stores previously read memory pages and reuses
them for subsequent reads, significantly reducing system calls and
improving performance.

Add code object caching mechanism with a new code_object_generation
field in the interpreter state that tracks when code object caches need
invalidation. This allows efficient reuse of parsed code object metadata
and eliminates redundant processing of the same code objects across
debugging sessions.

Optimize memory operations by replacing multiple individual structure
copies with single bulk reads for the same data structures. This reduces
the number of memory operations and system calls required to gather
debugging information from the target process.

Update Makefile.pre.in to include Python/remote_debug.h in the headers
list, ensuring that changes to the remote debugging header force proper
recompilation of dependent modules and maintain build consistency across
the codebase.

Also, make the module compatible with the free threading build as an extra :)

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2025-05-25 20:19:29 +00:00
Peter Bierma b8998fe2d8
gh-131185: Use a proper thread-local for cached thread states (gh-132510)
Switches over to a _Py_thread_local in place of autoTssKey, and also fixes a few other checks regarding PyGILState_Ensure after finalization.

Note that this doesn't fix concurrent use of PyGILState_Ensure with Py_Finalize; I'm pretty sure zapthreads doesn't work at all, and that needs to be fixed seperately.
2025-05-21 07:01:25 -06:00
Victor Stinner e79f640eb6
Simplify interp_look_up_id() (#134257)
Don't use PyInterpreterState_GetID() but get directly the interpreter
'id' member which cannot fail.
2025-05-19 18:09:10 +00:00
b-pass f2de1e6861
gh-134144: Fix use-after-free in zapthreads() (#134145) 2025-05-18 20:32:29 +05:30
Mark Shannon ac7d5ba96e
GH-133231: Changes to executor management to support proposed `sys._jit` module (GH-133287)
* Track the current executor, not the previous one, on the thread-state. 

* Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
2025-05-04 10:05:35 +01:00
Mark Shannon 44e4c479fb
GH-124715: Move trashcan mechanism into `Py_Dealloc` (GH-132280) 2025-04-30 11:37:53 +01:00
Mark Shannon ccf1b0b1c1
GH-132508: Use tagged integers on the evaluation stack for the last instruction offset (GH-132545) 2025-04-29 18:00:35 +01:00
Petr Viktorin 0c26dbd16e
gh-133079: Remove Py_C_RECURSION_LIMIT & PyThreadState.c_recursion_remaining (GH-133080)
Both were added in 3.13, are undocumented, and don't make sense in 3.14 due to
changes in the stack overflow detection machinery (gh-112282).

PEP 387 exception for skipping a deprecation period: https://github.com/python/steering-council/issues/288
2025-04-29 12:56:20 +02:00
Eric Snow fe462f5a91
gh-132775: Drop PyUnstable_InterpreterState_GetMainModule() (gh-132978)
We replace it with _Py_GetMainModule(), and add _Py_CheckMainModule(), but both in the internal-only C-API.  We also add _PyImport_GetModulesRef(), which is the equivalent of _PyImport_GetModules(), but which increfs before the lock is released.

This is used by a later change related to pickle and handling __main__.
2025-04-28 12:46:22 -06:00
Bénédikt Tran 427e7fc099
gh-132399: ensure correct alignment of `PyInterpreterState` (#132428) 2025-04-19 11:03:06 +02:00
Victor Stinner 61317074d4
gh-131238: Add pycore_interpframe_structs.h header (#131553)
Add an explicit include to pycore_interpframe_structs.h in
pycore_runtime_structs.h to fix a dependency cycle.
2025-03-21 17:19:47 +00:00
Victor Stinner b69da006a4
gh-131238: Remove includes from pycore_interp.h (#131495)
Remove also now unused includes in C files.
2025-03-20 11:35:23 +00:00
Victor Stinner 20c5f969dd
gh-131238: Remove more includes from pycore_interp.h (#131480) 2025-03-19 23:01:32 +01:00
Victor Stinner 22706843e0
gh-131238: Remove many includes from pycore_interp.h (#131472) 2025-03-19 17:46:24 +00:00
Victor Stinner 0453e494b6
gh-131238: Convert pycore_pystate.h static inline to functions (#131352)
Convert static inline functions to functions:

* _Py_IsMainThread()
* _PyInterpreterState_Main()
* _Py_IsMainInterpreterFinalizing()
* _Py_GetMainConfig()
2025-03-17 12:31:55 +01:00
Mark Shannon a1aeec61c4
GH-131238: Core header refactor (GH-131250)
* Moves most structs in pycore_ header files into pycore_structs.h and pycore_runtime_structs.h

* Removes many cross-header dependencies
2025-03-17 09:19:04 +00:00
Sam Gross 052cb717f5
gh-124878: Fix race conditions during interpreter finalization (#130649)
The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.

The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.

The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
2025-03-06 10:38:34 -05:00
Mark Shannon 78d50e91ff
GH-127705: better double free message. (GH-130785)
* Add location information when accessing already closed stackref

* Add #def option to track closed stackrefs to provide precise information for use after free and double frees.
2025-03-05 14:00:42 +00:00
Sam Gross 7aeaa5af2c
gh-130091: Reorder `_PyThreadState_Attach` to avoid data race (gh-130092)
This moves `tstate_activate()` down to avoid a data race in the free
threading build on the `_PyRuntime`'s thread-local `autoTSSkey`. This
key is deleted during runtime finalization, which may happen
concurrently with a call to `_PyThreadState_Attach`.

The earlier `tstate_try/wait_attach` ensures that the thread is blocked
before it attempts to access the deleted `autoTSSkey`.

This fixes a TSAN reported data race in
`test_threading.test_import_from_another_thread`.
2025-02-27 13:57:19 -05:00
Sam Gross d027787c8d
gh-130421: Fix data race on timebase initialization (gh-130592)
Windows and macOS require precomputing a "timebase" in order to convert
OS timestamps into nanoseconds. Retrieve and compute this value during
runtime initialization to avoid data races when accessing the time.
2025-02-27 13:27:54 +00:00
Mark Shannon 014223649c
GH-130396: Use computed stack limits on linux (GH-130398)
* Implement C recursion protection with limit pointers for Linux, MacOS and Windows

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
2025-02-25 09:24:48 +00:00
Petr Viktorin ef29104f7d
GH-91079: Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now (GH130413)
Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now

Unfortunatlely, the change broke some buildbots.

This reverts commit 2498c22fa0.
2025-02-24 11:16:08 +01:00
Sam Gross ca22147547
gh-111924: Fix data races when swapping allocators (gh-130287)
CPython current temporarily changes `PYMEM_DOMAIN_RAW` to the default
allocator during initialization and shutdown. The motivation is to
ensure that core runtime structures are allocated and freed using the
same allocator. However, modifying the current allocator changes global
state and is not thread-safe even with the GIL. Other threads may be
allocating or freeing objects use PYMEM_DOMAIN_RAW; they are not
required to hold the GIL to call PyMem_RawMalloc/PyMem_RawFree.

This adds new internal-only functions like `_PyMem_DefaultRawMalloc`
that aren't affected by calls to `PyMem_SetAllocator()`, so they're
appropriate for Python runtime initialization and finalization. Use
these calls in places where we previously swapped to the default raw
allocator.
2025-02-20 11:31:15 -05:00
Mark Shannon 2498c22fa0
GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)
* Implement C recursion protection with limit pointers

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
2025-02-19 11:44:57 +00:00
Kumar Aditya 0d68b14a0d
gh-128002: use per threads tasks linked list in asyncio (#128869)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2025-02-06 19:51:07 +01:00
Brandt Bucher 828b27680f
GH-126599: Remove the PyOptimizer API (GH-129194) 2025-01-28 16:10:51 -08:00
Yury Selivanov 188598851d
GH-91048: Add utils for capturing async call stack for asyncio programs and enable profiling (#124640)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Co-authored-by: Savannah Ostrowski <savannahostrowski@gmail.com>
Co-authored-by: Jacob Coffee <jacob@z7x.org>
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
2025-01-22 17:25:29 +01:00
Victor Stinner 8ceb6cb117
gh-129033: Remove _PyInterpreterState_SetConfig() function (#129048)
Remove _PyInterpreterState_GetConfigCopy() and
_PyInterpreterState_SetConfig() private functions. PEP 741 "Python
Configuration C API" added a better public C API: PyConfig_Get() and
PyConfig_Set().
2025-01-20 16:31:33 +01:00
Peter Bierma 4d0a6595a0
gh-128360: Add `_Py_AssertHoldsTstate` as assertion for holding a thread state (#128361)
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2025-01-20 17:04:35 +05:30
Mark Shannon 128cc47fbd
GH-127705: Add debug mode for `_PyStackRef`s inspired by HPy debug mode (GH-128121) 2024-12-20 16:52:20 +00:00
mpage dabcecfd6d
gh-115999: Enable specialization of `CALL` instructions in free-threaded builds (#127123)
The CALL family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below.

A few changes were needed to make CALL_ALLOC_AND_ENTER_INIT thread-safe:

Added _PyType_LookupRefAndVersion, which returns the type version corresponding to the returned ref.

Added _PyType_CacheInitForSpecialization, which takes an init method and the corresponding type version and only populates the specialization cache if the current type version matches the supplied version. This prevents potentially caching a stale value in free-threaded builds if we race with an update to __init__.

Only cache __init__ functions that are deferred in free-threaded builds. This ensures that the reference to __init__ that is stored in the specialization cache is valid if the type version guard in _CHECK_AND_ALLOCATE_OBJECT passes.
Fix a bug in _CREATE_INIT_FRAME where the frame is pushed to the stack on failure.

A few other miscellaneous changes were also needed:

Use {LOCK,UNLOCK}_OBJECT in LIST_APPEND. This ensures that the list's per-object lock is held while we are appending to it.

Add missing co_tlbc for _Py_InitCleanup.

Stop/start the world around setting the eval frame hook. This allows us to read interp->eval_frame non-atomically and preserves the behavior of _CHECK_PEP_523 documented below.
2024-12-03 11:20:20 -08:00
Radislav Chugunov ca3ea9ad05
gh-109746: Make _thread.start_new_thread delete state of new thread on its startup failure (GH-109761)
If Python fails to start newly created thread
due to failure of underlying PyThread_start_new_thread() call,
its state should be removed from interpreter' thread states list
to avoid its double cleanup.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-11-22 21:20:07 +02:00
Eric Snow 9dabace39d
gh-114940: Add _Py_FOR_EACH_TSTATE_UNLOCKED(), and Friends (gh-127077)
This is a precursor to the actual fix for gh-114940, where we will change these macros to use the new lock.  This change is almost entirely mechanical; the exceptions are the loops in codeobject.c and ceval.c, which now hold the "head" lock.  Note that almost all of the uses of _Py_FOR_EACH_TSTATE_UNLOCKED() here will change to _Py_FOR_EACH_TSTATE_BEGIN() once we add the new per-interpreter lock.
2024-11-21 11:08:38 -07:00
Peter Bierma 6c1a4fb6d4
gh-121058: Warn if `PyThreadState_Clear` is called with an exception set (gh-121343) 2024-11-20 10:27:19 -07:00
Eric Snow 1c0a104eca
gh-126914: Store the Preallocated Thread State's Pointer in a PyInterpreterState Field (gh-126989)
This approach eliminates the originally reported race. It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.
2024-11-19 12:59:19 -07:00
Eric Snow d6b3e78504
gh-126986: Drop _PyInterpreterState_FailIfNotRunning() (gh-126988)
We replace it with _PyErr_SetInterpreterAlreadyRunning().
2024-11-19 00:11:12 +00:00
Eric Snow 9357fdcaf0
gh-76785: Minor Cleanup of "Cross-interpreter" Code (gh-126457)
The primary objective here is to allow some later changes to be cleaner. Mostly this involves renaming things and moving a few things around.

* CrossInterpreterData -> XIData
* crossinterpdatafunc -> xidatafunc
* split out pycore_crossinterp_data_registry.h
* add _PyXIData_lookup_t
2024-11-07 09:32:42 -07:00
mpage 2e95c5ba3b
gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP` (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
2024-11-04 11:13:32 -08:00
Eric Snow 6f26d496d3
gh-125286: Share the Main Refchain With Legacy Interpreters (gh-125709)
They used to be shared, before 3.12.  Returning to sharing them resolves a failure on Py_TRACE_REFS builds.

Co-authored-by: Petr Viktorin <encukou@gmail.com>
2024-10-23 10:10:06 -06:00
Eric Snow 6d93690954
gh-125604: Move _Py_AuditHookEntry, etc. Out of pycore_runtime.h (gh-125605)
This is essentially a cleanup, moving a handful of API declarations to the header files where they fit best, creating new ones when needed.

We do the following:

* add pycore_debug_offsets.h and move _Py_DebugOffsets, etc. there
* inline struct _getargs_runtime_state and struct _gilstate_runtime_state in _PyRuntimeState
* move struct _reftracer_runtime_state to the existing pycore_object_state.h
* add pycore_audit.h and move to it _Py_AuditHookEntry , _PySys_Audit(), and _PySys_ClearAuditHooks
* add audit.h and cpython/audit.h and move the existing audit-related API there
*move the perfmap/trampoline API from cpython/sysmodule.h to cpython/ceval.h, and remove the now-empty cpython/sysmodule.h
2024-10-18 09:26:08 -06:00
Sam Gross 3ea488aac4
gh-124218: Use per-thread refcounts for code objects (#125216)
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
2024-10-15 15:06:41 -04:00
Kumar Aditya 5d8739e956
gh-111924: use atomics for interp id refcounting (#125321) 2024-10-12 12:40:34 +05:30
Tian Gao 5e0abb4788
gh-116750: Add clear_tool_id function to unregister events and callbacks (#124568) 2024-10-01 13:32:55 -04:00
Sam Gross b482538523
gh-124218: Refactor per-thread reference counting (#124844)
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.

Rename some of the files and functions in preparation for this change.
2024-10-01 17:05:42 +00:00
Savannah Ostrowski 65f1237098
GH-123516: Improve JIT memory consumption by invalidating cold executors (GH-124443)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-09-27 00:35:42 +00:00
Jason Fried d87482bc4e
gh-119333: Add C api to have contextvar enter/exit callbacks (#119335)
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
2024-09-23 20:40:17 -07:00
Anthony Shaw 86f06cb3fb
Remove comment from pystate created in 2003 (#123259) 2024-08-24 20:36:42 +00:00
Pablo Galindo Salgado d7a3df9150
Add debug offsets for free threaded builds (#123041) 2024-08-15 18:42:41 +00:00
Sam Gross 2d9d3a9f53
gh-122697: Fix free-threading memory leaks at shutdown (#122703)
We were not properly accounting for interpreter memory leaks at
shutdown and had two sources of leaks:

 * Objects that use deferred reference counting and were reachable via
   static types outlive the final GC. We now disable deferred reference
   counting on all objects if we are calling the GC due to interpreter
   shutdown.

 * `_PyMem_FreeDelayed` did not properly check for interpreter shutdown
   so we had some memory blocks that were enqueued to be freed, but
   never actually freed.

 * `_PyType_FinalizeIdPool` wasn't called at interpreter shutdown.
2024-08-08 12:48:17 -04:00
Sam Gross dc09301067
gh-122417: Implement per-thread heap type refcounts (#122418)
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.

Co-authored-by: Ken Jin <kenjin@python.org>
2024-08-06 14:36:57 -04:00