* gh-129701: Fix a data race in `intern_common` in the free threaded build
* Use a mutex to avoid potentially returning a non-immortalized string,
because immortalization happens after the insertion into the interned
dict.
* Use `Py_DECREF()` calls instead of `Py_SET_REFCNT(s, Py_REFCNT(s) - 2)`
for thread-safety. This code path isn't performance sensistive, so
just use `Py_DECREF()` unconditionally for simplicity.
We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.
With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.
Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.
Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.
Also add a thorough test that the implementation agrees with this definition.
Author: Greg Price <gnprice@gmail.com>
Co-authored-by: Greg Price <gnprice@gmail.com>
Implement PyUnicode_KIND() and PyUnicode_DATA() as function, in
addition to the macros with the same names. The macros rely on C bit
fields which have compiler-specific layout.
There was a data race on the utf8 field between `PyUnicode_SET_UTF8` and
`_PyUnicode_CheckConsistency`. Use the `_PyUnicode_UTF8()` accessor,
which uses an atomic load internally, to avoid the data race.
They used to be shared, before 3.12. Returning to sharing them resolves a failure on Py_TRACE_REFS builds.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.
This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS
failures identified by gh-124785.
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
* Replace PyLong_AS_LONG() with PyLong_AsLong().
* Call PyLong_AsLong() only once per the replacement code.
* Use PyMapping_GetOptionalItem() instead of PyObject_GetItem().
* Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs
* Document immortality in some functions that take `const char *`
This is PyUnicode_InternFromString;
PyDict_SetItemString, PyObject_SetAttrString;
PyObject_DelAttrString; PyUnicode_InternFromString;
and the PyModule_Add convenience functions.
Always point out a non-immortalizing alternative.
* Don't immortalize user-provided attr names in _ctypes
Fix warnings when using -Wimplicit-fallthrough compiler flag.
Annotate explicitly "fall through" switch cases with a new
_Py_FALLTHROUGH macro which uses __attribute__((fallthrough)) if
available. Replace "fall through" comments with _Py_FALLTHROUGH.
Add _Py__has_attribute() macro. No longer define __has_attribute()
macro if it's not defined. Move also _Py__has_builtin() at the top
of pyport.h.
Co-Authored-By: Nikita Sobolev <mail@sobolevn.me>
PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.