cpython

Commit Graph

Author	SHA1	Message	Date
Barney Gale	5dbd27db7d	GH-128520: pathlib ABCs: add `JoinablePath.__vfspath__()` (#133437 ) In the abstract interface of `JoinablePath`, replace `__str__()` with `__vfspath__()`. This frees user implementations of `JoinablePath` to implement `__str__()` however they like (or not at all.) Also add `pathlib._os.vfspath()`, which calls `__fspath__()` or `__vfspath__()`.	2025-05-12 19:00:36 +01:00
Serhiy Storchaka	add0ca9ea0	gh-133306: Use \z instead of \Z in fnmatch.translate() and glob.translate() (GH-133338)	2025-05-03 17:58:21 +03:00
Barney Gale	bbd6d17ef8	GH-130614: pathlib ABCs: support alternate separator in `full_match()` (#130991 ) In `pathlib.types._JoinablePath.full_match()`, treat alternate path separators in the path and pattern as if they were primary separators. e.g. if the parser is `ntpath`, then `P(r'foo/bar\baz').full_match(r'\/*')` is true.	2025-03-09 16:36:59 +00:00
Barney Gale	5326c27fc6	Revert "GH-116380: Speed up `glob.[i]glob()` by making fewer system calls. (#116392 )" (#130743 ) This broke tests on the 'aarch64 Fedora Stable Clang Installed 3.x' and 'AMD64 Fedora Stable Clang Installed 3.x' build bots. This reverts commit `da4899b94a`.	2025-03-01 20:04:01 +00:00
Barney Gale	da4899b94a	GH-116380: Speed up `glob.[i]glob()` by making fewer system calls. (#116392 ) ## Filtered recursive walk Expanding a recursive `` segment entails walking the entire directory tree, and so any subsequent pattern segments (except special segments) can be evaluated by filtering the expanded paths through a regex. For example, `glob.glob("foo//.py", recursive=True)` recursively walks `foo/` with `os.scandir()`, and then filters paths through a regex based on "`/.py`, with no further filesystem access needed. This fixes an issue where `glob()` could return duplicate results. ## Tracking path existence We store a flag alongside each path indicating whether the path is guaranteed to exist. As we process the pattern: - Certain special pattern segments (`""`, `"."` and `".."`) leave the flag unchanged - Literal pattern segments (e.g. `foo/bar`) set the flag to false - Wildcard pattern segments (e.g. `/.py`) set the flag to true (because children are found via `os.scandir()`) - Recursive pattern segments (e.g. `**`) leave the flag unchanged for the root path, and set it to true for descendants discovered via `os.scandir()`. If the flag is false at the end, we call `lstat()` on each path to filter out missing paths. ## Minor speed-ups - Exclude paths that don't match a non-terminal non-recursive wildcard pattern _prior_ to calling `is_dir()`. - Use a stack rather than recursion to implement recursive wildcards. - This fixes a recursion error when globbing deep trees. - Pre-compile regular expressions and pre-join literal pattern segments. - Convert to/from `bytes` (a minor use-case) in `iglob()` rather than supporting `bytes` throughout. This particularly simplifies the code needed to handle relative bytes paths with `dir_fd`. - Avoid calling `os.path.join()`; instead we keep paths in a normalized form and append trailing slashes when needed. - Avoid calling `os.path.normcase()`; instead we use case-insensitive regex matching. ## Implementation notes Much of this functionality is already present in pathlib's implementation of globbing. The specific additions we make are: 1. Support for `dir_fd` 2. Support for `include_hidden` 3. Support for generating paths relative to `root_dir` This unifies the implementations of globbing in the `glob` and `pathlib` modules. Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>	2025-02-28 20:33:51 +00:00
Barney Gale	48c84a400a	GH-125413: pathlib ABCs: use caching `path.info.exists()` when globbing (#130422 ) Call `ReadablePath.info.exists()` rather than `ReadablePath.exists()` when globbing so that we use (or populate) the `info` cache.	2025-02-24 19:07:54 +00:00
Barney Gale	707d066193	GH-129835: Yield path with trailing slash from `ReadablePath.glob('')` (#129836 ) In the private pathlib ABCs, make `ReadablePath.glob('')` yield a path with a trailing slash (if it yields anything at all). As a result, `glob()` works similarly to `joinpath()` when given a non-magic pattern. In the globbing implementation, we preemptively add trailing slashes to intermediate paths if there are pattern parts remaining; this removes the need to check for existing trailing slashes (in the removed `add_slash()` method) at subsequent steps.	2025-02-08 06:47:09 +00:00
Barney Gale	718ab66299	GH-125413: Add `pathlib.Path.info` attribute (#127730 ) Add `pathlib.Path.info` attribute, which stores an object implementing the `pathlib.types.PathInfo` protocol (also new). The object supports querying the file type and internally caching `os.stat()` results. Path objects generated by `Path.iterdir()` are initialised with status information from `os.DirEntry` objects, which is gleaned from scanning the parent directory. The `PathInfo` protocol has four methods: `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.	2025-02-08 01:16:45 +00:00
Bénédikt Tran	78cb377c62	gh-122288: Improve performances of `fnmatch.translate` (#122289 ) Improve performance of this function by a factor of 1.7x. Co-authored-by: Barney Gale <barney.gale@gmail.com>	2024-11-27 16:42:45 +00:00
Barney Gale	68a51e0178	GH-125413: pathlib ABCs: use `scandir()` to speed up `glob()` (#126261 ) Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly reduces the number of `PathBase.stat()` calls needed when globbing. There are no user-facing changes, because the pathlib ABCs are still private and `Path.glob()` doesn't use the implementation in its superclass.	2024-11-01 17:48:58 +00:00
Barney Gale	242c7498e5	GH-116380: Move pathlib-specific code from `glob` to `pathlib._abc`. (#120011 ) In `glob._Globber`, move pathlib-specific methods to `pathlib._abc.PathGlobber` and replace them with abstract methods. Rename `glob._Globber` to `glob._GlobberBase`. As a result, the `glob` module is no longer befouled by code that can only ever apply to pathlib. No change of behaviour.	2024-06-07 17:59:34 +01:00
Barney Gale	7ff61f51b6	GH-119169: Implement `pathlib.Path.walk()` using `os.walk()` (#119573 ) For silly reasons, pathlib's generic implementation of `walk()` currently resides in `glob._Globber`. This commit moves it into `pathlib._abc.PathBase.walk()` where it really belongs, and makes `pathlib.Path.walk()` call `os.walk()`.	2024-05-29 20:51:04 +00:00
Barney Gale	fbe6a0988f	GH-101357: Suppress `OSError` from `pathlib.Path.exists()` and `is_()` (#118243 ) Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_()` rather than a selection of more common errors as we do presently. Also adjust the implementations to call `os.path.exists()` etc, which are much faster on Windows thanks to GH-101196.	2024-05-14 17:53:15 +00:00
Barney Gale	b4bdf83cc6	GH-116380: Revert move of pathlib globbing code to `pathlib._glob` (#118678 ) The previous change made the `glob` module slower to import, because it imported `pathlib._glob` and hence the rest of `pathlib`. Reverts `a40f557d7b`.	2024-05-07 00:32:48 +00:00
Barney Gale	a40f557d7b	GH-116380: Move pathlib globbing implementation into `pathlib._glob` (#118562 ) Moving this code under the `pathlib` package makes it quite a lot easier to backport in the `pathlib-abc` PyPI package. It was a bit foolish of me to add it to `glob` in the first place. Also add `translate()` to `__all__` in `glob`. This function is new in 3.13, so there's no NEWS needed.	2024-05-03 20:29:25 +00:00
Barney Gale	0eb52f5f26	GH-115060: Speed up `pathlib.Path.glob()` by not scanning literal parts (#117732 ) Don't bother calling `os.scandir()` to scan for literal pattern segments, like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call through to the next selector with `exists=False`, which signals that the path might not exist. Subsequent selectors will call `os.scandir()` or `os.lstat()` to filter out missing paths as needed.	2024-04-12 22:19:21 +01:00
Barney Gale	0cc71bde00	GH-117586: Speed up `pathlib.Path.walk()` by working with strings (#117726 ) Move `pathlib.Path.walk()` implementation into `glob._Globber`. The new `glob._Globber.walk()` classmethod works with strings internally, which is a little faster than generating `Path` objects and keeping them normalized. The `pathlib.Path.walk()` method converts the strings back to path objects. In the private pathlib ABCs, our existing subclass of `_Globber` ensures that `PathBase` instances are used throughout. Follow-up to #117589.	2024-04-11 01:26:53 +01:00
Barney Gale	6258844c27	GH-117586: Speed up `pathlib.Path.glob()` by working with strings (#117589 ) Move pathlib globbing implementation into a new private class: `glob._Globber`. This class implements fast string-based globbing. It's called by `pathlib.Path.glob()`, which then converts strings back to path objects. In the private pathlib ABCs, add a `pathlib._abc.Globber` subclass that works with `PathBase` objects rather than strings, and calls user-defined path methods like `PathBase.stat()` rather than `os.stat()`. This sets the stage for two more improvements: - GH-115060: Query non-wildcard segments with `lstat()` - GH-116380: Unify `pathlib` and `glob` implementations of globbing. No change to the implementations of `glob.glob()` and `glob.iglob()`.	2024-04-10 20:43:07 +01:00
Barney Gale	fc8007ee36	GH-117337: Deprecate `glob.glob0()` and `glob.glob1()`. (#117371 ) These undocumented functions are no longer used by `msilib`, so there's no reason to keep them around.	2024-04-01 19:37:41 +00:00
Serhiy Storchaka	5a78f6e798	gh-117134: Microoptimize glob() for include_hidden=True (GH-117135)	2024-03-22 20:03:48 +02:00
Barney Gale	0634201f53	GH-116377: Stop raising `ValueError` from `glob.translate()`. (#116378 ) Stop raising `ValueError` from `glob.translate()` when a `**` sub-string appears in a non-recursive pattern segment. This matches `glob.glob()` behaviour.	2024-03-17 17:09:35 +00:00
Serhiy Storchaka	aeffc7f895	gh-79382: Fix recursive glob() with trailing "" (GH-115134) Trailing "" no longer allows to match files and non-existing paths in recursive glob().	2024-02-11 12:24:13 +02:00
Barney Gale	cf67ebfb31	GH-72904: Add `glob.translate()` function (#106703 ) Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`. This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `` pattern segment matches precisely one path segment. When recursive* is set to true, `` pattern segments match any number of path segments, and `` cannot appear outside its own segment. In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code. Co-authored-by: Jason R. Coombs <jaraco@jaraco.com> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>	2023-11-13 17:15:56 +00:00
andrei kulakov	ae36cd1e79	bpo-37578: glob.glob -- added include_hidden parameter (GH-30153) Automerge-Triggered-By: GH:asvetlov	2021-12-18 06:23:34 -08:00
Serhiy Storchaka	5c7940257e	bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843)	2021-06-23 12:53:37 +03:00
Saiyang Gou	a32f8fe713	bpo-43756: Add new audit event for new arguments added to glob.glob (GH-25239)	2021-04-21 23:42:55 +01:00
Serhiy Storchaka	1d3469988e	bpo-38144: Re-add accidentally removed audition for glob. (GH-22805)	2020-10-20 19:45:38 +03:00
Serhiy Storchaka	8a64ceaf98	bpo-38144: Add the root_dir and dir_fd parameters in glob.glob(). (GH-16075)	2020-06-18 22:08:27 +03:00
Serhiy Storchaka	54b4f14712	bpo-38149: Call sys.audit() only once per call for glob.glob(). (GH-18360)	2020-02-06 10:26:37 +02:00
Steve Dower	60419a7e96	bpo-37363: Add audit events for a range of modules (GH-14301)	2019-06-24 08:42:54 -07:00
Serhiy Storchaka	3ae41554c6	Issue #27998 : Removed workarounds for supporting bytes paths on Windows in os.walk() function and glob module since os.scandir() now directly supports them.	2016-10-05 23:17:10 +03:00
Serhiy Storchaka	c98b26a6ac	Issue #25596 : Falls back to listdir in glob for bytes paths on Windows.	2016-09-07 09:49:42 +03:00
Serhiy Storchaka	28ab634fa6	Issue #25596 : Optimized glob() and iglob() functions in the glob module; they are now about 3--6 times faster.	2016-09-06 22:33:41 +03:00
Serhiy Storchaka	04b5700b36	Issue #25584 : Added "escape" to the __all__ list in the glob module. From patch by Xavier de Gaye.	2015-11-09 23:18:19 +02:00
Serhiy Storchaka	735b790fed	Issue #25584 : Fixed recursive glob() with patterns starting with '**'.	2015-11-09 23:12:07 +02:00
Serhiy Storchaka	c2edcdd194	Issue #13968 : The glob module now supports recursive search in subdirectories using the "**" pattern.	2014-09-11 12:17:37 +03:00
Serhiy Storchaka	6f20170762	Issue #17923 : glob() patterns ending with a slash no longer match non-dirs on AIX. Based on patch by Delhallt.	2014-08-12 12:55:12 +03:00
Serhiy Storchaka	fd32fffa5a	Issue #8402 : Added the escape() function to the glob module.	2013-11-18 13:06:43 +02:00
Petri Lehtinen	914ec6f718	Issue #16695 : Document how glob handles filenames starting with a dot	2013-02-23 19:56:15 +01:00
Petri Lehtinen	ee4a20bad6	Issue #16695 : Document how glob handles filenames starting with a dot	2013-02-23 19:53:27 +01:00
Hynek Schlawack	6e5c8f992a	#16618 : Make glob.glob match consistently across strings and bytes Fixes handling of leading dots. Patch by Serhiy Storchaka.	2012-12-27 10:20:38 +01:00
Hynek Schlawack	e26568f812	#16618 : Make glob.glob match consistently across strings and bytes Fixes handling of leading dots. Patch by Serhiy Storchaka.	2012-12-27 10:10:11 +01:00
Andrew Svetlov	ad28c7f9da	Issue #16706 : get rid of os.error	2012-12-18 22:02:39 +02:00
Antoine Pitrou	feb318a37a	Issue #16696 : fix comparison between bytes and string. Also, improve glob tests.	2012-12-16 16:03:57 +01:00
Antoine Pitrou	5461558d1a	Issue #16696 : fix comparison between bytes and string. Also, improve glob tests.	2012-12-16 16:03:01 +01:00
Antoine Pitrou	39a6ee20ac	Issue #16626 : Fix infinite recursion in glob.glob() on Windows when the pattern contains a wildcard in the drive or UNC path. Patch by Serhiy Storchaka.	2012-12-16 13:54:14 +01:00
Antoine Pitrou	3d068b2ecf	Issue #16626 : Fix infinite recursion in glob.glob() on Windows when the pattern contains a wildcard in the drive or UNC path. Patch by Serhiy Storchaka.	2012-12-16 13:49:37 +01:00
Tim Golden	9b3fb0c6a0	Backed out changeset dafca4714298	2012-11-06 15:33:30 +00:00
Tim Golden	8f323d9aca	issue9584: Add {} list expansion to glob. Original patch by Mathieu Bridon	2012-11-06 13:50:42 +00:00
Philip Jenvey	4993cc0a5b	utilize yield from	2012-10-01 12:53:43 -07:00

1 2

69 Commits