Concurrent accesses from multiple threads to the same `cell` object did not
scale well in the free-threaded build. Use `_PyStackRef` and optimistically
avoid locking to improve scaling.
With the locks around cell reads gone, some of the free threading tests were
prone to starvation: the readers were able to run in a tight loop and the
writer threads weren't scheduled frequently enough to make timely progress.
Adjust the tests to avoid this.
Co-authored-by: Donghee Na <donghee.na@python.org>
* Reduce the number of iterations and the number of threads so a
whole test file takes less than a minute.
* Refactor test_racing_iter_extend() to remove two levels of
indentation.
* test_monitoring() uses a sleep of 100 ms instead of 1 second.
Makes sys.settrace, sys.setprofile, and monitoring generally thread-safe.
Mostly uses a stop-the-world approach and synchronization around the code object's _co_instrumentation_version. There may be a little bit of extra synchronization around the monitoring data that's required to be TSAN clean.