Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
* gh-96132: Add some comments and minor fixes missed in the original PR
* Update Doc/using/cmdline.rst
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
⚠️⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️⚠️
If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**.
If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
* Add support for the BOLT post-link binary optimizer
Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt)
provides a fairly large speedup without any code or functionality
changes. It provides roughly a 1% speedup on pyperformance, and a
4% improvement on the Pyston web macrobenchmarks.
It is gated behind an `--enable-bolt` configure arg because not all
toolchains and environments are supported. It has been tested on a
Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6
sources (their binary distribution of this version did not include bolt).
Compared to [a previous attempt](https://github.com/faster-cpython/ideas/issues/224),
this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE
flags which enable much better optimizations from bolt.
The effects of this change are a bit more dependent on CPU microarchitecture
than other changes, since it optimizes i-cache behavior which seems
to be a bit more variable between architectures. The 1%/4% numbers
were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I
got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance
I got a slightly lower speedup (1%/3%).
The low speedup on pyperformance is not entirely unexpected, because
BOLT improves i-cache behavior, and the benchmarks in the pyperformance
suite are small and tend to fit in i-cache.
This change uses the existing pgo profiling task (`python -m test --pgo`),
though I was able to measure about a 1% macrobenchmark improvement by
using the macrobenchmarks as the training task. I personally think that
both the PGO and BOLT tasks should be updated to use macrobenchmarks,
but for the sake of splitting up the work this PR uses the existing pgo task.
* Simplify the build flags
* Add a NEWS entry
* Update Makefile.pre.in
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Update configure.ac
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Add myself to ACKS
* Add docs
* Other review comments
* fix tab/space issue
* Make it more clear that --enable-bolt is experimental
* Add link to bolt's github page
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
Remove the "configure --with-cxx-main" build option: it didn't work
for many years. Remove the MAINCC variable from configure and
Makefile.
The MAINCC variable was added by the issue gh-42471: commit
0f48d98b74. Previously, --with-cxx-main
was named --with-cxx.
Keep CXX and LDCXXSHARED variables, even if they are no longer used
by Python build system.
If an HTTP link is redirected to a same looking HTTPS link, the latter can
be used directly without changes in readability and behavior.
It protects from a men-in-the-middle attack.
This change does not affect Python examples.
Add the -P command line option and the PYTHONSAFEPATH environment
variable to not prepend a potentially unsafe path to sys.path.
* Add sys.flags.safe_path flag.
* Add PyConfig.safe_path member.
* Programs/_bootstrap_python.c uses config.safe_path=0.
* Update subprocess._optim_args_from_interpreter_flags() to handle
the -P command line option.
* Modules/getpath.py sets safe_path to 1 if a "._pth" file is
present.
When Python is built with "./configure --enable-pystats" (if the
Py_STATS macro is defined), the _Py_GetSpecializationStats() function
must be exported, since it's used by the _opcode extension which is
built as a shared library.
- Remove ``--with-tclk-*`` options from `configure`
- Use pkg-config to detect `_tkinter` dependencies (Tcl/Tk, X11)
- Manual override via environment variables `TCLTK_CFLAGS` and `TCLTK_LIBS`
The default was "off". Switching it to "on" means users get the benefit of frozen stdlib modules without having to do anything. There's a special-case for running-in-source-tree, so contributors don't get surprised when their stdlib changes don't get used.
https://bugs.python.org/issue45020
Replace old names when they refer to actual versions of macOS.
Keep historical names in references to older versions.
Co-authored-by: Patrick Reader <_@pxeger.com>
Currently we freeze several modules into the runtime. For each of these modules it is essential to bootstrapping the runtime that they be frozen. Any other stdlib module that we later freeze into the runtime is not essential. We can just as well import from the .py file. This PR lets users explicitly choose which should be used, with the new "-X frozen_modules=[on|off]" CLI flag. The default is "off" for now.
https://bugs.python.org/issue45020
The threading debug (PYTHONTHREADDEBUG environment variable) is
deprecated in Python 3.10 and will be removed in Python 3.12. This
feature requires a debug build of Python.
* Add also references to --with-trace-refs option.
* Move _ob_next and _ob_prev at the end, since they don't exist by
default and are related to debug.
- Remove HAVE_X509_VERIFY_PARAM_SET1_HOST check
- Update hashopenssl to require OpenSSL 1.1.1
- multissltests only OpenSSL > 1.1.0
- ALPN is always supported
- SNI is always supported
- Remove deprecated NPN code. Python wrappers are no-op.
- ECDH is always supported
- Remove OPENSSL_VERSION_1_1 macro
- Remove locking callbacks
- Drop PY_OPENSSL_1_1_API macro
- Drop HAVE_SSL_CTX_CLEAR_OPTIONS macro
- SSL_CTRL_GET_MAX_PROTO_VERSION is always defined now
- security level is always available now
- get_num_tickets is available with TLS 1.3
- X509_V_ERR MISMATCH is always available now
- Always set SSL_MODE_RELEASE_BUFFERS
- X509_V_FLAG_TRUSTED_FIRST is always available
- get_ciphers is always supported
- SSL_CTX_set_keylog_callback is always available
- Update Modules/Setup with static link example
- Mention PEP in whatsnew
- Drop 1.0.2 and 1.1.0 from GHA tests
Add Doc/using/configure.rst documentation to document configure,
preprocessor, compiler and linker options.
Add a new section about the "Python debug build".
See [PEP 597](https://www.python.org/dev/peps/pep-0597/).
* Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`.
* Add EncodingWarning
* Add io.text_encoding()
* open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled.
* _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python)
* bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding().
* What's new entry
This lease on this domain has lapsed. This not only makes these dead links, but a potential attack vector for readers of python.org as the domain can be obtained by an untrustworthy party.
I considered redirecting these links to http://mingw-w64.org/ which is a maintained fork of mingw, but beyond my unfamiliarity with the exact level of compatibility, at the time of this PR that site had an expired cert and so is not much of a vulnerability fix.
Automerge-Triggered-By: GH:Mariatta
Enhance the documentation of the Python startup, filesystem encoding
and error handling, locale encoding. Add a new "Python UTF-8 Mode"
section.
* Add "locale encoding" and "filesystem encoding and error handler"
to the glossary
* Remove documentation from Include/cpython/initconfig.h: move it to
Doc/c-api/init_config.rst.
* Doc/c-api/init_config.rst:
* Document command line options and environment variables
* Document default values.
* Add a new "Python UTF-8 Mode" section in Doc/library/os.rst.
* Add warnings to Py_DecodeLocale() and Py_EncodeLocale() docs.
* Document how Python selects the filesystem encoding and error
handler at a single place: PyConfig.filesystem_encoding and
PyConfig.filesystem_errors.
* PyConfig: move orig_argv member at the right place.
Fixes incorrect Python version added for `venv` `--upgrade-deps` in #13100. This feature was added in Python 3.9 not 3.8.
Relates to:
-
- 1cba1c9aba
Automerge-Triggered-By: @vsajip
This commit removes the old parser, the deprecated parser module, the old parser compatibility flags and environment variables and all associated support code and documentation.
* Rename PyConfig.use_peg to _use_peg_parser
* Document PyConfig._use_peg_parser and mark it a deprecated
* Mark -X oldparser option and PYTHONOLDPARSER env var as deprecated
in the documentation.
* Add use_old_parser() and skip_if_new_parser() to test.support
* Remove sys.flags.use_peg: use_old_parser() uses
_testinternalcapi.get_configs() instead.
* Enhance test_embed tests
* subprocess._args_from_interpreter_flags() copies -X oldparser
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().
By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
Add --upgrade-deps to venv module
- This allows for pip + setuptools to be automatically upgraded to the latest version on PyPI
- Update documentation to represent this change
bpo-34556: Add --upgrade to venv module
Release build and debug build are now ABI compatible: the Py_DEBUG
define no longer implies Py_TRACE_REFS define which introduces the
only ABI incompatibility.
A new "./configure --with-trace-refs" build option is now required to
get Py_TRACE_REFS define which adds sys.getobjects() function and
PYTHONDUMPREFS environment variable.
Changes:
* Add ./configure --with-trace-refs
* Py_DEBUG no longer implies Py_TRACE_REFS
In development mode (-X dev) and in debug build, the io.IOBase
destructor now logs close() exceptions. These exceptions are silent
by default in release mode.
* Revert "bpo-34589: Add -X coerce_c_locale command line option (GH-9378)"
This reverts commit dbdee0073c.
* Revert "bpo-34589: C locale coercion off by default (GH-9073)"
This reverts commit 7a0791b699.
* Revert "bpo-34589: Make _PyCoreConfig.coerce_c_locale private (GH-9371)"
This reverts commit 188ebfa475.
In some development setups it is inconvenient or impossible to write bytecode
caches to the code tree, but the bytecode caches are still useful. The
PYTHONPYCACHEPREFIX environment variable allows specifying an alternate
location for cached bytecode files, within which a directory tree mirroring the code
tree will be created. This cache tree is then used (for both reading and writing)
instead of the local `__pycache__` subdirectory within each source directory.
Exposed at runtime as sys.pycache_prefix (defaulting to None), and can
be set from the CLI as "-X pycache_prefix=path".
Patch by Carl Meyer.
While locale coercion and UTF-8 mode turned out to
be complementary ideas rather than competing ones,
it isn't immediately obvious why it's useful to
have both, or how they interact at runtime.
This updates both the Python 3.7 What's New doc
and the PYTHONCOERCECLOCALE and PYTHONUTF8
documentation in an attempt to clarify that
relationship:
- in the respective What's New sections, add a closing paragraph
explaining which problem each one solves, and pointing to the
other PEP's section for the specific aspects it relies on the other
PEP to solve
- use "locale-aware mode" as a more descriptive term for the
default non-UTF-8 mode
- improve wording conistenccy between the PYTHONCOERCECLOCALE
and PYTHONUTF8 docs when they cover the same thing (mostly
related to legacy locale detection and setting the standard
stream error handler)
- improve the description of the locale coercion trigger conditions
(including pointing out that setting LC_ALL turns off locale coercion)
- port the full description of the UTF-8 mode behaviour changes
from PEP 540 into the PYTHONUTF8 documentation
- be explicit that PYTHONIOENCODING still overrides the settings
for the standard streams
- mention concrete examples of things that do and don't get their
text encoding assumptions adjusted by the two text encoding
assumption override techniques
- primary change is to add a new default filter entry for
'default::DeprecationWarning:__main__'
- secondary change is an internal one to cope with plain
strings in the warning module's internal filter list
(this avoids the need to create a compiled regex object
early on during interpreter startup)
- assorted documentation updates, including many more
examples of configuring the warnings settings
- additional tests to ensure that both the pure Python and
the C accelerated warnings modules have the expected
default configuration
bpo-29240, bpo-32030: If the encoding change (C locale coerced or
UTF-8 Mode changed), Py_Main() now reads again the configuration with
the new encoding.
Changes:
* Add _Py_UnixMain() called by main().
* Rename pymain_free_pymain() to pymain_clear_pymain(), it can now be
called multipled times.
* Rename pymain_parse_cmdline_envvars() to pymain_read_conf().
* Py_Main() now clears orig_argc and orig_argv at exit.
* Remove argv_copy2, Py_Main() doesn't modify argv anymore. There is
no need anymore to get two copies of the wchar_t** argv.
* _PyCoreConfig: add coerce_c_locale and coerce_c_locale_warn.
* Py_UTF8Mode is now initialized to -1.
* Locale coercion (PEP 538) now respects -I and -E options.
bpo-32329, bpo-32030:
* The -R option now turns on hash randomization when the
PYTHONHASHSEED environment variable is set to 0 Previously, the
option was ignored.
* sys.flags.hash_randomization is now properly set to 0 when hash
randomization is turned off by PYTHONHASHSEED=0.
* _PyCoreConfig_ReadEnv() now reads the PYTHONHASHSEED environment
variable. _Py_HashRandomization_Init() now only apply the
configuration, it doesn't read PYTHONHASHSEED anymore.
* Add -X utf8 command line option, PYTHONUTF8 environment variable
and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
_winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
mode. As a side effect, open() now uses the UTF-8 encoding by
default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
always copy flag values, rather than only copying if the new value
is greater than the old value.
Rather than supporting dev mode directly in the warnings module, this
instead adjusts the initialisation code to add an extra 'default'
entry to sys.warnoptions when dev mode is enabled.
This ensures that dev mode behaves *exactly* as if `-Wdefault` had
been passed on the command line, including in the way it interacts
with `sys.warnoptions`, and with other command line flags like `-bb`.
Fix also bpo-20361: have -b & -bb options take precedence over any
other warnings options.
Patch written by Nick Coghlan, with minor modifications of Victor Stinner.
Python now supports checking bytecode cache up-to-dateness with a hash of the
source contents rather than volatile source metadata. See the PEP for details.
While a fairly straightforward idea, quite a lot of code had to be modified due
to the pervasiveness of pyc implementation details in the codebase. Changes in
this commit include:
- The core changes to importlib to understand how to read, validate, and
regenerate hash-based pycs.
- Support for generating hash-based pycs in py_compile and compileall.
- Modifications to our siphash implementation to support passing a custom
key. We then expose it to importlib through _imp.
- Updates to all places in the interpreter, standard library, and tests that
manually generate or parse pyc files to grok the new format.
- Support in the interpreter command line code for long options like
--check-hash-based-pycs.
- Tests and documentation for all of the above.
* bpo-32101: Add sys.flags.dev_mode flag
Rename also the "Developer mode" to the "Development mode".
* bpo-32101: Add PYTHONDEVMODE environment variable
Mention it in the development chapiter.
* Fix _PyMem_SetupAllocators("debug"): always restore allocators to
the defaults, rather than only caling _PyMem_SetupDebugHooks().
* Add _PyMem_SetDefaultAllocator() helper to set the "default"
allocator.
* Add _PyMem_GetAllocatorsName(): get the name of the allocators
* main() now uses debug hooks on memory allocators if Py_DEBUG is
defined, rather than calling directly malloc()
* Document default memory allocators in C API documentation
* _Py_InitializeCore() now fails with a fatal user error if
PYTHONMALLOC value is an unknown memory allocator, instead of
failing with a fatal internal error.
* Add new tests on the PYTHONMALLOC environment variable
* Add support.with_pymalloc()
* Add the _testcapi.WITH_PYMALLOC constant and expose it as
support.with_pymalloc().
* sysconfig.get_config_var('WITH_PYMALLOC') doesn't work on Windows, so
replace it with support.with_pymalloc().
* pythoninfo: add _testcapi collector for pymem
The developer mode (-X dev) now creates all default warnings filters
to order filters in the correct order to always show ResourceWarning
and make BytesWarning depend on the -b option.
Write a functional test to make sure that ResourceWarning is logged
twice at the same location in the developer mode.
Add a new 'dev_mode' field to _PyCoreConfig.
Add a new "developer mode": new "-X dev" command line option to
enable debug checks at runtime.
Changes:
* Add unit tests for -X dev
* test_cmd_line: replace test.support with support.
* Fix _PyRuntimeState_Fini(): Use the same memory allocator
than _PyRuntimeState_Init().
* Fix _PyMem_GetDefaultRawAllocator()
- removes PY_WARN_ON_C_LOCALE build time flag
- locale coercion and compatibility warnings are now always compiled
in, but are off by default
- adds PYTHONCOERCECLOCALE=warn runtime option to aid in
debugging potentially locale related compatibility problems
Due to not-yet-resolved test failures on *BSD systems (including
Mac OS X), this also temporarily disables UTF-8 as a locale coercion
target, and skips testing the interpreter's behavior in the POSIX locale.
- new PYTHONCOERCECLOCALE config setting
- coerces legacy C locale to C.UTF-8, C.utf8 or UTF-8 by default
- always uses C.UTF-8 on Android
- uses `surrogateescape` on stdin and stdout in the coercion
target locales
- configure option to disable locale coercion at build time
- configure option to disable C locale warning at build time
* Cannot seem to link directly to main options from the “unittest” module,
because that module has its own set of options
* Mask out linking for options that no longer exist in Python 3