cpython

Commit Graph

Author	SHA1	Message	Date
Inada Naoki	4e294f6feb	gh-133036: Deprecate codecs.open (#133038 ) Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Victor Stinner <vstinner@python.org>	2025-04-30 10:11:09 +09:00
morotti	b1b4f9625c	gh-117151: IO performance improvement, increase io.DEFAULT_BUFFER_SIZE to 128k (GH-118144) Co-authored-by: rmorotti <romain.morotti@man.com>	2025-03-07 11:36:12 -08:00
Cody Maloney	886a4d74ee	gh-129011: Update comments in FileIO to match current code (#129012 )	2025-03-06 17:18:22 -08:00
Sebastian Rittau	c6dd2348ca	gh-127647: Add typing.Reader and Writer protocols (#127648 )	2025-03-06 07:36:19 -08:00
Cody Maloney	a3d5aab9a8	gh-129005: Align FileIO.readall between _pyio and _io (#129705 ) Utilize `bytearray.resize()` and `os.readinto()` to reduce copies and match behavior of `_io.FileIO.readall()`. There is still an extra copy which means twice the memory required compared to FileIO because there isn't a zero-copy path from `bytearray` -> `bytes` currently. On my system reading a 2 GB file: `./python -m test -M8g -uall test_largefile -m test.test_largefile.PyLargeFileTest.test_large_read -v` Goes from ~2.7 seconds -> ~2.2 seconds Co-authored-by: Victor Stinner <vstinner@python.org>	2025-02-07 12:06:11 +01:00
Cody Maloney	052ca8ffe8	gh-129005: Update _pyio.BytesIO to use bytearray.resize on write (#129702 ) Co-authored-by: Victor Stinner <vstinner@python.org>	2025-02-06 10:18:08 +00:00
Cody Maloney	853a6b7de2	Revert "gh-129005: Align FileIO.readall() allocation (#129458 )" (#129572 ) This reverts commit `f927204f64`.	2025-02-02 15:59:15 +01:00
Cody Maloney	3ebe3d7688	Revert "gh-129005: _pyio.BufferedIO remove copy on readall (#129454 )" (#129500 ) This reverts commit `e1c4ba9288`.	2025-01-31 09:40:44 +01:00
Cody Maloney	e1c4ba9288	gh-129005: _pyio.BufferedIO remove copy on readall (#129454 ) Slicing buf and appending chunk would always result in a copy. Commonly in a readall() there is no already read data in buf, and the amount of data read may be large, so the copy is expensive.	2025-01-30 11:23:25 +00:00
Cody Maloney	f927204f64	gh-129005: Align FileIO.readall() allocation (#129458 ) Both now use a pre-allocated buffer of length `bufsize`, fill it using a readinto(), and have matching "expand buffer" logic. On my machine this takes: `./python -m test -M8g -uall test_largefile -m test_large_read -v` from ~3.7 seconds to ~3.4 seconds.	2025-01-30 11:14:23 +00:00
Cody Maloney	180ee43bde	gh-129005: Avoid copy in _pyio.FileIO.readinto() (#129324 ) `os.read()` allocated and filled a buffer by calling `read(2)`, than that data was copied into the user provied buffer. Read directly into the caller's buffer instead by using `os.readinto()`. `os.readinto()` uses `PyObject_GetBuffer()` to make sure the passed in buffer is writeable and bytes-like, drop the manual check.	2025-01-28 12:40:44 +01:00
Giovanni Siragusa	31f16e427b	gh-109523: Raise a BlockingIOError if reading text from a non-blocking stream cannot immediately return bytes. (GH-122933)	2024-12-02 14:18:30 +01:00
Cody Maloney	72dd4714f9	gh-120754: _io Ensure stat cache is cleared on fd change (#125166 ) Performed an audit of `fileio.c` and `_pyio` and made sure anytime the fd changes the stat result, if set, is also cleared/changed. There's one case where it's not cleared, if code would clear it in __init__, keep the memory allocated and just do another fstat with the existing memory.	2024-11-01 22:50:49 +01:00
Cody Maloney	43ad3b5170	gh-90102: Fix pyio _isatty_open_only() (#125089 ) Spotted by @ngnpope. `isatty` returns False to indicate the file is not a TTY. The C implementation of _io does that (`Py_RETURN_FALSE`) but I got the bool backwards in the _pyio implementaiton.	2024-10-08 11:49:50 +00:00
Cody Maloney	cc9b9bebb2	gh-90102: Remove isatty call during regular open (#124922 ) Co-authored-by: Victor Stinner <vstinner@python.org>	2024-10-08 08:50:42 +02:00
Cody Maloney	8b6c7c7877	gh-120754: Refactor I/O modules to stash whole stat result rather than individual members (#123412 ) Multiple places in the I/O stack optimize common cases by using the information from stat. Currently individual members are extracted from the stat and stored into the fileio struct. Refactor the code to store the whole stat struct instead. Parallels the changes to _io. The `stat` Python object doesn't allow changing members, so rather than modifying estimated_size, just clear the value.	2024-09-18 17:47:57 +02:00
Cody Maloney	2f5f19e783	gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755 ) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3` fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("*/.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>	2024-07-04 09:17:00 +02:00
Victor Stinner	6ae254aaa0	gh-120417: Add #noqa to used imports in the stdlib (#120421 ) Tools such as ruff can ignore "imported but unused" warnings if a line ends with "# noqa: F401". It avoids the temptation to remove an import which is used effectively.	2024-06-13 16:14:50 +02:00
6t8k	26800cf25a	gh-95782: Fix io.BufferedReader.tell() etc. being able to return offsets < 0 (GH-99709) lseek() always returns 0 for character pseudo-devices like `/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it always returns -1, to which CPython reacts by raising appropriate exceptions). They are thus technically seekable despite not having seek semantics. When calling read() on e.g. an instance of `io.BufferedReader` that wraps such a file, `BufferedReader` reads ahead, filling its buffer, creating a discrepancy between the number of bytes read and the internal `tell()` always returning 0, which previously resulted in e.g. `BufferedReader.tell()` or `BufferedReader.seek()` being able to return positions < 0 even though these are supposed to be always >= 0. Invariably keep the return value non-negative by returning max(former_return_value, 0) instead, and add some corresponding tests.	2024-02-17 11:16:06 +00:00
Serhiy Storchaka	652fbf88c4	gh-82626: Emit a warning when bool is used as a file descriptor (GH-111275)	2024-02-05 22:51:11 +02:00
Zackery Spytz	73c9326563	gh-80109: Fix io.TextIOWrapper dropping the internal buffer during write() (GH-22535) io.TextIOWrapper was dropping the internal decoding buffer during read() and write() calls.	2024-01-08 12:33:34 +02:00
Victor Stinner	58a2e09816	gh-62948: IOBase finalizer logs close() errors (#105104 )	2023-05-31 11:41:19 +00:00
Nick Drozd	024ac542d7	bpo-45975: Simplify some while-loops with walrus operator (GH-29347)	2022-11-26 14:33:25 -08:00
Nikita Sobolev	2cfcaf5af6	gh-98999: Raise `ValueError` in `_pyio` on closed buffers (gh-99009)	2022-11-03 12:03:12 +09:00
Victor Stinner	6e33ba114f	gh-94169: Remove deprecated io.OpenWrapper (#94170 ) Remove io.OpenWrapper and _pyio.OpenWrapper, deprecated in Python 3.10: just use :func:`open` instead. The open() (io.open()) function is a built-in function. Since Python 3.10, _pyio.open() is also a static method.	2022-06-24 08:46:53 +02:00
Dong-hee Na	f7fabae75c	gh-93099: Fix _pyio to use locale module properly (gh-93136)	2022-05-24 09:37:01 +09:00
Inada Naoki	0729b31a8b	gh-91952: Make TextIOWrapper.reconfigure() supports "locale" encoding (GH-91982)	2022-05-01 10:44:14 +09:00
Inada Naoki	6fdb62b1fa	gh-91526: io: Remove device encoding support from TextIOWrapper (GH-91529) `TextIOWrapper.__init__()` called `os.device_encoding(file.fileno())` if fileno is 0-2 and encoding=None. But it is very rarely works, and never documented behavior.	2022-04-19 11:44:36 +09:00
Inada Naoki	13b17e2a0a	gh-91156: Fix `encoding="locale"` in UTF-8 mode (GH-70056)	2022-04-14 16:00:35 +09:00
Inada Naoki	4216dce04b	bpo-47000: Make `io.text_encoding()` respects UTF-8 mode (GH-32003) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>	2022-04-04 11:46:57 +09:00
slateny	cedd2473a9	bpo-25415: Remove confusing sentence from IOBase docstrings (PR-31631)	2022-03-04 12:35:52 -05:00
Thomas Grainger	9b12b1b803	bpo-46522: fix concurrent.futures and io AttributeError messages (GH-30887) Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>	2022-02-23 02:25:00 +02:00
Victor Stinner	19ba2122ac	bpo-37330: open() no longer accept 'U' in file mode (GH-28118) open(), io.open(), codecs.open() and fileinput.FileInput no longer accept "U" ("universal newline") in the file mode. This flag was deprecated since Python 3.3.	2021-09-02 12:58:00 +02:00
Victor Stinner	3bc694d5f3	bpo-43680: Deprecate io.OpenWrapper (GH-25357) Deprecate io.OpenWrapper and _pyio.OpenWrapper: use io.open and _pyio.open instead. Until Python 3.9, _pyio.open was not a static method and builtins.open was set to OpenWrapper to not become a bound method when set to a class variable. _io.open is a built-in function whereas _pyio.open is a Python function. In Python 3.10, _pyio.open() is now a static method, and builtins.open() is now io.open().	2021-04-14 03:24:33 +02:00
Victor Stinner	77d668b122	bpo-43680: _pyio.open() becomes a static method (GH-25354) The Python _pyio.open() function becomes a static method to behave as io.open() built-in function: don't become a bound method when stored as a class variable. It becomes possible since static methods are now callable in Python 3.10. Moreover, _pyio.OpenWrapper becomes a simple alias to _pyio.open. init_set_builtins_open() now sets builtins.open to io.open, rather than setting it to io.OpenWrapper, since OpenWrapper is now an alias to open in the io and _pyio modules.	2021-04-12 10:44:53 +02:00
Inada Naoki	cfa176685a	Revert "bpo-43510: PEP 597: Accept `encoding="locale"` in binary mode (GH-25103)" (#25108 ) This reverts commit `ff3c9739bd`.	2021-03-31 18:49:41 +09:00
Inada Naoki	ff3c9739bd	bpo-43510: PEP 597: Accept `encoding="locale"` in binary mode (GH-25103) It make `encoding="locale"` usable everywhere `encoding=None` is allowed.	2021-03-31 14:26:08 +09:00
Inada Naoki	4827483f47	bpo-43510: Implement PEP 597 opt-in EncodingWarning. (GH-19481) See [PEP 597](https://www.python.org/dev/peps/pep-0597/). * Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`. * Add EncodingWarning * Add io.text_encoding() * open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled. * _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python) * bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding(). * What's new entry	2021-03-29 12:28:14 +09:00
Victor Stinner	942f7a2dea	bpo-39674: Revert "bpo-37330: open() no longer accept 'U' in file mode (GH-16959)" (GH-18767) This reverts commit `e471e72977`. The mode will be removed from Python 3.10.	2020-03-04 18:50:22 +01:00
Berker Peksag	fd5116c0e7	bpo-35950: Raise UnsupportedOperation in BufferedReader.truncate() (GH-18586) The truncate() method of io.BufferedReader() should raise UnsupportedOperation when it is called on a read-only io.BufferedReader() instance. https://bugs.python.org/issue35950 Automerge-Triggered-By: @methane	2020-02-21 09:57:26 -08:00
Benjamin Peterson	74fa9f723f	closes bpo-27805: Ignore ESPIPE in initializing seek of append-mode files. (GH-17112) This change, which follows the behavior of C stdio's fdopen and Python 2's file object, allows pipes to be opened in append mode.	2019-11-12 14:51:34 -08:00
Victor Stinner	e471e72977	bpo-37330: open() no longer accept 'U' in file mode (GH-16959) open(), io.open(), codecs.open() and fileinput.FileInput no longer accept "U" ("universal newline") in the file mode. This flag was deprecated since Python 3.3.	2019-10-28 15:40:08 +01:00
Serhiy Storchaka	1f21eaa15e	bpo-15999: Clean up of handling boolean arguments. (GH-15610) * Use the 'p' format unit instead of manually called PyObject_IsTrue(). * Pass boolean value instead 0/1 integers to functions that needs boolean. * Convert some arguments to boolean only once.	2019-09-01 12:16:51 +03:00
Raymond Hettinger	0dac68f1e5	bpo-36743: __get__ is sometimes called without the owner argument (#12992 )	2019-08-29 01:27:42 -07:00
Serhiy Storchaka	b235a1b473	bpo-37960: Silence only necessary errors in repr() of buffered and text streams. (GH-15543)	2019-08-29 09:25:22 +03:00
Min ho Kim	c4cacc8c5e	Fix typos in comments, docs and test names (#15018 ) * Fix typos in comments, docs and test names * Update test_pyparse.py account for change in string length * Apply suggestion: splitable -> splittable Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu> * Apply suggestion: splitable -> splittable Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu> * Apply suggestion: Dealloccte -> Deallocate Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu> * Update posixmodule checksum. * Reverse idlelib changes.	2019-07-30 18:16:13 -04:00
Victor Stinner	22eb689cf3	bpo-37388: Development mode check encoding and errors (GH-14341) In development mode and in debug build, encoding and errors arguments are now checked on string encoding and decoding operations. Examples: open(), str.encode() and bytes.decode(). By default, for best performances, the errors argument is only checked at the first encoding/decoding error, and the encoding argument is sometimes ignored for empty strings.	2019-06-26 00:51:05 +02:00
Victor Stinner	4f6f7c5a61	bpo-18748: Fix _pyio.IOBase destructor (closed case) (GH-13952) _pyio.IOBase destructor now does nothing if getting the closed attribute fails to better mimick _io.IOBase finalizer.	2019-06-11 02:49:06 +02:00
Victor Stinner	a3568417c4	bpo-37054, _pyio: Fix BytesIO and TextIOWrapper __del__() (GH-13601) Fix destructor _pyio.BytesIO and _pyio.TextIOWrapper: initialize their _buffer attribute as soon as possible (in the class body), because it's used by __del__() which calls close().	2019-05-28 01:44:21 +02:00
Steve Dower	b82e17e626	bpo-36842: Implement PEP 578 (GH-12613) Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.	2019-05-23 08:45:22 -07:00

1 2 3 4

195 Commits