Commit Graph

165 Commits

Author SHA1 Message Date
Inada Naoki 4e294f6feb
gh-133036: Deprecate codecs.open (#133038)
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2025-04-30 10:11:09 +09:00
Victor Stinner 20c5f969dd
gh-131238: Remove more includes from pycore_interp.h (#131480) 2025-03-19 23:01:32 +01:00
Victor Stinner 978e37bb5f
gh-131238: Add explicit includes to pycore headers (#131257) 2025-03-17 12:32:43 +01:00
Sergey Miryanov 3a7f17c7e2
gh-130790: Remove references about unicode's readiness from comments (#130801) 2025-03-03 19:18:09 +00:00
Bénédikt Tran 3146a25e97
gh-129173: refactor `PyCodec_BackslashReplaceErrors` into separate functions (#129895)
The logic of `PyCodec_BackslashReplaceErrors` is now split into separate functions,
each of which handling a specific exception type.
2025-03-03 13:58:15 +01:00
Bénédikt Tran f693f84227
gh-129173: simplify `PyCodec_XMLCharRefReplaceErrors` logic (#129894)
Writing the decimal representation of a Unicode codepoint only requires to know the number of digits.

---------

Co-authored-by: Petr Viktorin <encukou@gmail.com>
2025-03-03 11:43:22 +00:00
Bénédikt Tran fa6a8140dd
gh-129173: refactor `PyCodec_ReplaceErrors` into separate functions (#129893)
The logic of `PyCodec_ReplaceErrors` is now split into separate functions,
each of which handling a specific exception type.
2025-02-25 14:24:46 +01:00
Bénédikt Tran e24a1ac17c
gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_SurrogateEscapeErrors` (GH-129175) 2025-02-20 13:18:47 +00:00
Bénédikt Tran 1775091dc1
gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_SurrogatePassErrors` (GH-129134) 2025-02-14 18:34:32 +01:00
Bénédikt Tran a56ead089c
gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_NameReplaceErrors` (GH-129135) 2025-02-08 16:01:57 +01:00
Bénédikt Tran 36bb229933
gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_IgnoreErrors` (#129174)
We also cleanup `PyCodec_StrictErrors` and the error message rendered
when an object of incorrect type is passed to codec error handlers.
2025-01-24 11:25:03 +01:00
Bénédikt Tran 25a614a502
gh-126004: Fix positions handling in `codecs.backslashreplace_errors` (#127676)
This fixes how `PyCodec_BackslashReplaceErrors` handles the `start` and `end`
attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
2025-01-23 14:28:33 +01:00
Bénédikt Tran 225296cd5b
gh-126004: Fix positions handling in `codecs.replace_errors` (#127674)
This fixes how `PyCodec_ReplaceErrors` handles the `start` and `end` attributes
of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
2025-01-23 11:44:18 +01:00
Bénédikt Tran 70dcc847df
gh-126004: Fix positions handling in `codecs.xmlcharrefreplace_errors` (#127675)
This fixes how `PyCodec_XMLCharRefReplaceErrors` handles the `start` and `end`
attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
2025-01-23 11:42:38 +01:00
Victor Stinner b9a8ca0a6a
gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_STR) (#125194)
Replace PyUnicode_New(0, 0), PyUnicode_FromString("")
and PyUnicode_FromStringAndSize("", 0)
with Py_GetConstant(Py_CONSTANT_EMPTY_STR).
2024-10-09 17:15:23 +02:00
Bénédikt Tran c00964ecd5
gh-124665: Add `_PyCodec_UnregisterError` and `_codecs._unregister_error` (#124677) 2024-09-29 02:25:23 +02:00
Petr Viktorin 6f1d448bc1
gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520)
* Add an InternalDocs file describing how interning should work and how to use it.

* Add internal functions to *explicitly* request what kind of interning is done:
  - `_PyUnicode_InternMortal`
  - `_PyUnicode_InternImmortal`
  - `_PyUnicode_InternStatic`

* Switch uses of `PyUnicode_InternInPlace` to those.

* Disallow using `_Py_SetImmortal` on strings directly.
  You should use `_PyUnicode_InternImmortal` instead:
  - Strings should be interned before immortalization, otherwise you're possibly
    interning a immortalizing copy.
  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
    backports, as they are now part of public API and version-specific ABI.

* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
  - `_Py_ID`
  - `_Py_STR` (including the empty string)
  - one-character latin-1 singletons

  Now, when you intern a singleton, that exact singleton will be interned.

* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

* Intern `_Py_STR` singletons at startup.

* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.

* Beef up the tests. Cover internal details (marked with `@cpython_only`).

* Add lots of assertions

Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
2024-06-21 17:19:31 +02:00
Brett Simmers f8290df63f
gh-116738: Make `_codecs` module thread-safe (#117530)
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:

- Move codecs-related state that lives on `PyInterpreterState` to a
  struct declared in `pycore_codecs.h`.

- In free-threaded builds, add a mutex to `codecs_state` to synchronize
  operations on `search_path`. Because `search_path_mutex` is used as a
  normal mutex and not a critical section, we must be extremely careful
  with operations called while holding it.

- The codec registry is explicitly initialized as part of
  `_PyUnicode_InitEncodings` to simplify thread-safety.
2024-05-02 18:25:36 -04:00
Kirill Podoprigora 0785c68559
gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249) 2023-11-30 11:12:49 +01:00
Serhiy Storchaka aa438bdd6d
gh-111789: Use PyDict_GetItemRef() in Python/codecs.c (gh-112082) 2023-11-27 18:53:43 +01:00
Victor Stinner 03c4080c71
gh-108765: Python.h no longer includes <ctype.h> (#108831)
Remove <ctype.h> in C files which don't use it; only sre.c and
_decimal.c still use it.

Remove _PY_PORT_CTYPE_UTF8_ISSUE code from pyport.h:

* Code added by commit b5047fd019
  in 2004 for MacOSX and FreeBSD.
* Test removed by commit 52ddaefb6b
  in 2007, since Python str type now uses locale independent
  functions like Py_ISALPHA() and Py_TOLOWER() and the Unicode
  database.

Modules/_sre/sre.c replaces _PY_PORT_CTYPE_UTF8_ISSUE with new
functions: sre_isalnum(), sre_tolower(), sre_toupper().

Remove unused includes:

* _localemodule.c: remove <stdio.h>.
* getargs.c: remove <float.h>.
* dynload_win.c: remove <direct.h>, it no longer calls _getcwd()
  since commit fb1f68ed7c (in 2001).
2023-09-03 18:54:27 +02:00
Victor Stinner 4dc9f48930
gh-108308: Replace _PyDict_GetItemStringWithError() (#108372)
Replace _PyDict_GetItemStringWithError() calls with
PyDict_GetItemStringRef() which returns a strong reference to the
item.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-08-23 22:59:00 +02:00
Victor Stinner 615f6e946d
gh-106320: Remove _PyDict_GetItemStringWithError() function (#108313)
Remove private _PyDict_GetItemStringWithError() function of the
public C API: the new PyDict_GetItemStringRef() can be used instead.

* Move private _PyDict_GetItemStringWithError() to the internal C API.
* _testcapi get_code_extra_index() uses PyDict_GetItemStringRef().
  Avoid using private functions in _testcapi which tests the public C
  API.
2023-08-22 18:17:25 +00:00
Serhiy Storchaka be1b968dc1
gh-106521: Remove _PyObject_LookupAttr() function (GH-106642) 2023-07-12 08:57:10 +03:00
Victor Stinner bc7eb17084
gh-106320: Use _PyInterpreterState_GET() (#106336)
Replace PyInterpreterState_Get() with inlined
_PyInterpreterState_GET().
2023-07-02 16:37:37 +00:00
Irit Katriel 55c99d97e1
gh-77757: replace exception wrapping by PEP-678 notes in typeobject's __set_name__ (#103402) 2023-04-11 11:53:06 +01:00
Irit Katriel 76350e85eb
gh-102406: replace exception chaining by PEP-678 notes in codecs (#102407) 2023-03-21 21:36:31 +00:00
Victor Stinner 8211cf5d28
gh-99300: Replace Py_INCREF() with Py_NewRef() (#99530)
Replace Py_INCREF() and Py_XINCREF() using a cast with Py_NewRef()
and Py_XNewRef().
2022-11-16 18:34:24 +01:00
Victor Stinner d8f239d86e
gh-99300: Use Py_NewRef() in Python/ directory (#99302)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in C files of the Python/ directory.
2022-11-10 09:03:39 +01:00
Eric Snow 81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Kumar Aditya 41026c3155
bpo-45855: Replaced deprecated `PyImport_ImportModuleNoBlock` with PyImport_ImportModule (GH-30046) 2021-12-12 10:45:20 +02:00
Victor Stinner d943d19172
bpo-45439: Move _PyObject_CallNoArgs() to pycore_call.h (GH-28895)
* Move _PyObject_CallNoArgs() to pycore_call.h (internal C API).
* _ssl, _sqlite and _testcapi extensions now call the public
  PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs().
* _lsprof extension is now built with Py_BUILD_CORE_MODULE macro
  defined to get access to internal _PyObject_CallNoArgs().
2021-10-12 08:38:19 +02:00
Victor Stinner ce3489cfdb
bpo-45439: Rename _PyObject_CallNoArg() to _PyObject_CallNoArgs() (GH-28891)
Fix typo in the private _PyObject_CallNoArg() function name: rename
it to _PyObject_CallNoArgs() to be consistent with the public
function PyObject_CallNoArgs().
2021-10-12 00:42:23 +02:00
Victor Stinner 920cb647ba
bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)
* UCD_Check() uses PyModule_Check()
* Simplify the internal _PyUnicode_Name_CAPI structure:

  * Remove size and state members
  * Remove state and self parameters of getcode() and getname()
    functions

* Remove global_module_state
2020-10-26 19:19:36 +01:00
Victor Stinner 47e1afd2a1
bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)
The private _PyUnicode_Name_CAPI structure of the PyCapsule API
unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the
structure gets a new state member which must be passed to the
getcode() and getname() functions.

* Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h
* unicodedata module is now built with Py_BUILD_CORE_MODULE.
* unicodedata: move hashAPI variable into unicodedata_module_state.
2020-10-26 16:43:47 +01:00
Hai Shi c9f696cb96
bpo-41919, test_codecs: Move codecs.register calls to setUp() (GH-22513)
* Move the codecs' (un)register operation to testcases.
* Remove _codecs._forget_codec() and _PyCodec_Forget()
2020-10-16 10:34:15 +02:00
Hai Shi d332e7b816
bpo-41842: Add codecs.unregister() function (GH-22360)
Add codecs.unregister() and PyCodec_Unregister() functions
to unregister a codec search function.
2020-09-28 23:41:11 +02:00
Victor Stinner e5014be049
bpo-40268: Remove a few pycore_pystate.h includes (GH-19510) 2020-04-14 17:52:15 +02:00
Victor Stinner 81a7be3fa2
bpo-40268: Rename _PyInterpreterState_GET_UNSAFE() (GH-19509)
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).

Add also "assert(tstate != NULL);" to the function.
2020-04-14 15:14:01 +02:00
Victor Stinner 4a3fe08353
bpo-40268: Include explicitly pycore_interp.h (GH-19505)
pycore_pystate.h no longer includes pycore_interp.h:
it's now included explicitly in files accessing PyInterpreterState.
2020-04-14 14:26:24 +02:00
Serhiy Storchaka cd8295ff75
bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345) 2020-04-11 10:48:40 +03:00
Victor Stinner ff4584caca
bpo-39947: Use _PyInterpreterState_GET_UNSAFE() (GH-18978)
Replace _PyInterpreterState_Get() function call with
_PyInterpreterState_GET_UNSAFE() macro which is more efficient but
don't check if tstate or interp is NULL.

_Py_GetConfigsAsDict() now uses _PyThreadState_GET().
2020-03-13 18:03:56 +01:00
Andy Lester 7386a70746
closes bpo-39630: Update pointers to string literals to be const char *. (GH-18510) 2020-02-13 20:42:56 -08:00
Petr Viktorin ffd9753a94
bpo-39245: Switch to public API for Vectorcall (GH-18460)
The bulk of this patch was generated automatically with:

    for name in \
        PyObject_Vectorcall \
        Py_TPFLAGS_HAVE_VECTORCALL \
        PyObject_VectorcallMethod \
        PyVectorcall_Function \
        PyObject_CallOneArg \
        PyObject_CallMethodNoArgs \
        PyObject_CallMethodOneArg \
    ;
    do
        echo $name
        git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
    done

    old=_PyObject_FastCallDict
    new=PyObject_VectorcallDict
    git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"

and then cleaned up:

- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
2020-02-11 17:46:57 +01:00
Victor Stinner a102ed7d2f
bpo-39573: Use Py_TYPE() macro in Python and Include directories (GH-18391)
Replace direct access to PyObject.ob_type with Py_TYPE().
2020-02-07 02:24:48 +01:00
Victor Stinner d3a1de2270
bpo-38631: Avoid Py_FatalError() in _PyCodecRegistry_Init() (GH-18217)
_PyCodecRegistry_Init() now reports exceptions to the caller,
rather than calling Py_FatalError().
2020-01-27 23:23:12 +01:00
Jordon Xu 20f59fe1f7 bpo-37751: Fix codecs.lookup() normalization (GH-15092)
Fix codecs.lookup() to normalize the encoding name the same way
than encodings.normalize_encoding(), except that codecs.lookup()
also converts the name to lower case.
2019-08-21 14:26:20 +01:00
Jeroen Demeyer 1dbd084f1f bpo-29548: no longer use PyEval_Call* functions (GH-14683) 2019-07-12 00:57:32 +09:00
Jeroen Demeyer 6e43d07324 bpo-37483: fix reference leak in _PyCodec_Lookup (GH-14600) 2019-07-05 19:57:32 +09:00
Jeroen Demeyer 196a530e00 bpo-37483: add _PyObject_CallOneArg() function (#14558) 2019-07-04 19:31:34 +09:00