Commit Graph

69 Commits

Author SHA1 Message Date
Barney Gale 5dbd27db7d
GH-128520: pathlib ABCs: add `JoinablePath.__vfspath__()` (#133437)
In the abstract interface of `JoinablePath`, replace `__str__()` with
`__vfspath__()`. This frees user implementations of `JoinablePath` to
implement `__str__()` however they like (or not at all.)

Also add `pathlib._os.vfspath()`, which calls `__fspath__()` or
`__vfspath__()`.
2025-05-12 19:00:36 +01:00
Serhiy Storchaka add0ca9ea0
gh-133306: Use \z instead of \Z in fnmatch.translate() and glob.translate() (GH-133338) 2025-05-03 17:58:21 +03:00
Barney Gale bbd6d17ef8
GH-130614: pathlib ABCs: support alternate separator in `full_match()` (#130991)
In `pathlib.types._JoinablePath.full_match()`, treat alternate path
separators in the path and pattern as if they were primary separators. e.g.
if the parser is `ntpath`, then `P(r'foo/bar\baz').full_match(r'*\*/*')` is
true.
2025-03-09 16:36:59 +00:00
Barney Gale 5326c27fc6
Revert "GH-116380: Speed up `glob.[i]glob()` by making fewer system calls. (#116392)" (#130743)
This broke tests on the 'aarch64 Fedora Stable Clang Installed 3.x' and
'AMD64 Fedora Stable Clang Installed 3.x' build bots.

This reverts commit da4899b94a.
2025-03-01 20:04:01 +00:00
Barney Gale da4899b94a
GH-116380: Speed up `glob.[i]glob()` by making fewer system calls. (#116392)
## Filtered recursive walk

Expanding a recursive `**` segment entails walking the entire directory
tree, and so any subsequent pattern segments (except special segments) can
be evaluated by filtering the expanded paths through a regex. For example,
`glob.glob("foo/**/*.py", recursive=True)` recursively walks `foo/` with
`os.scandir()`, and then filters paths through a regex based on "`**/*.py`,
with no further filesystem access needed.

This fixes an issue where `glob()` could return duplicate results.

## Tracking path existence

We store a flag alongside each path indicating whether the path is
guaranteed to exist. As we process the pattern:

- Certain special pattern segments (`""`, `"."` and `".."`) leave the flag
  unchanged
- Literal pattern segments (e.g. `foo/bar`) set the flag to false
- Wildcard pattern segments (e.g. `*/*.py`) set the flag to true (because
  children are found via `os.scandir()`)
- Recursive pattern segments (e.g. `**`) leave the flag unchanged for the
  root path, and set it to true for descendants discovered via
  `os.scandir()`.

If the flag is false at the end, we call `lstat()` on each path to filter
out missing paths.

## Minor speed-ups

- Exclude paths that don't match a non-terminal non-recursive wildcard
  pattern _prior_ to calling `is_dir()`.
- Use a stack rather than recursion to implement recursive wildcards.
  - This fixes a recursion error when globbing deep trees.
- Pre-compile regular expressions and pre-join literal pattern segments.
- Convert to/from `bytes` (a minor use-case) in `iglob()` rather than
  supporting `bytes` throughout. This particularly simplifies the code
  needed to handle relative bytes paths with `dir_fd`.
- Avoid calling `os.path.join()`; instead we keep paths in a normalized
  form and append trailing slashes when needed.
- Avoid calling `os.path.normcase()`; instead we use case-insensitive regex
  matching.

## Implementation notes

Much of this functionality is already present in pathlib's implementation
of globbing. The specific additions we make are:

1. Support for `dir_fd`
2. Support for `include_hidden`
3. Support for generating paths relative to `root_dir`

This unifies the implementations of globbing in the `glob` and `pathlib`
modules.

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-02-28 20:33:51 +00:00
Barney Gale 48c84a400a
GH-125413: pathlib ABCs: use caching `path.info.exists()` when globbing (#130422)
Call `ReadablePath.info.exists()` rather than `ReadablePath.exists()` when
globbing so that we use (or populate) the `info` cache.
2025-02-24 19:07:54 +00:00
Barney Gale 707d066193
GH-129835: Yield path with trailing slash from `ReadablePath.glob('')` (#129836)
In the private pathlib ABCs, make `ReadablePath.glob('')` yield a path with
a trailing slash (if it yields anything at all). As a result, `glob()`
works similarly to `joinpath()` when given a non-magic pattern.

In the globbing implementation, we preemptively add trailing slashes to
intermediate paths if there are pattern parts remaining; this removes the
need to check for existing trailing slashes (in the removed `add_slash()`
method) at subsequent steps.
2025-02-08 06:47:09 +00:00
Barney Gale 718ab66299
GH-125413: Add `pathlib.Path.info` attribute (#127730)
Add `pathlib.Path.info` attribute, which stores an object implementing the `pathlib.types.PathInfo` protocol (also new). The object supports querying the file type and internally caching `os.stat()` results. Path objects generated by `Path.iterdir()` are initialised with status information from `os.DirEntry` objects, which is gleaned from scanning the parent directory.

The `PathInfo` protocol has four methods: `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.
2025-02-08 01:16:45 +00:00
Bénédikt Tran 78cb377c62
gh-122288: Improve performances of `fnmatch.translate` (#122289)
Improve performance of this function by a factor of 1.7x.

Co-authored-by: Barney Gale <barney.gale@gmail.com>
2024-11-27 16:42:45 +00:00
Barney Gale 68a51e0178
GH-125413: pathlib ABCs: use `scandir()` to speed up `glob()` (#126261)
Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly
reduces the number of `PathBase.stat()` calls needed when globbing.

There are no user-facing changes, because the pathlib ABCs are still
private and `Path.glob()` doesn't use the implementation in its superclass.
2024-11-01 17:48:58 +00:00
Barney Gale 242c7498e5
GH-116380: Move pathlib-specific code from `glob` to `pathlib._abc`. (#120011)
In `glob._Globber`, move pathlib-specific methods to `pathlib._abc.PathGlobber` and replace them with abstract methods. Rename `glob._Globber` to `glob._GlobberBase`. As a result, the `glob` module is no longer befouled by code that can only ever apply to pathlib.

No change of behaviour.
2024-06-07 17:59:34 +01:00
Barney Gale 7ff61f51b6
GH-119169: Implement `pathlib.Path.walk()` using `os.walk()` (#119573)
For silly reasons, pathlib's generic implementation of `walk()` currently
resides in `glob._Globber`. This commit moves it into
`pathlib._abc.PathBase.walk()` where it really belongs, and makes
`pathlib.Path.walk()` call `os.walk()`.
2024-05-29 20:51:04 +00:00
Barney Gale fbe6a0988f
GH-101357: Suppress `OSError` from `pathlib.Path.exists()` and `is_*()` (#118243)
Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_*()`
rather than a selection of more common errors as we do presently. Also
adjust the implementations to call `os.path.exists()` etc, which are much
faster on Windows thanks to GH-101196.
2024-05-14 17:53:15 +00:00
Barney Gale b4bdf83cc6
GH-116380: Revert move of pathlib globbing code to `pathlib._glob` (#118678)
The previous change made the `glob` module slower to import, because it
imported `pathlib._glob` and hence the rest of `pathlib`.

Reverts a40f557d7b.
2024-05-07 00:32:48 +00:00
Barney Gale a40f557d7b
GH-116380: Move pathlib globbing implementation into `pathlib._glob` (#118562)
Moving this code under the `pathlib` package makes it quite a lot easier
to backport in the `pathlib-abc` PyPI package. It was a bit foolish of me
to add it to `glob` in the first place.

Also add `translate()` to `__all__` in `glob`. This function is new in
3.13, so there's no NEWS needed.
2024-05-03 20:29:25 +00:00
Barney Gale 0eb52f5f26
GH-115060: Speed up `pathlib.Path.glob()` by not scanning literal parts (#117732)
Don't bother calling `os.scandir()` to scan for literal pattern segments,
like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call
through to the next selector with `exists=False`, which signals that the
path might not exist. Subsequent selectors will call `os.scandir()` or
`os.lstat()` to filter out missing paths as needed.
2024-04-12 22:19:21 +01:00
Barney Gale 0cc71bde00
GH-117586: Speed up `pathlib.Path.walk()` by working with strings (#117726)
Move `pathlib.Path.walk()` implementation into `glob._Globber`. The new
`glob._Globber.walk()` classmethod works with strings internally, which is
a little faster than generating `Path` objects and keeping them normalized.
The `pathlib.Path.walk()` method converts the strings back to path objects.

In the private pathlib ABCs, our existing subclass of `_Globber` ensures
that `PathBase` instances are used throughout.

Follow-up to #117589.
2024-04-11 01:26:53 +01:00
Barney Gale 6258844c27
GH-117586: Speed up `pathlib.Path.glob()` by working with strings (#117589)
Move pathlib globbing implementation into a new private class: `glob._Globber`. This class implements fast string-based globbing. It's called by `pathlib.Path.glob()`, which then converts strings back to path objects.

In the private pathlib ABCs, add a `pathlib._abc.Globber` subclass that works with `PathBase` objects rather than strings, and calls user-defined path methods like `PathBase.stat()` rather than `os.stat()`.

This sets the stage for two more improvements:

- GH-115060: Query non-wildcard segments with `lstat()`
- GH-116380: Unify `pathlib` and `glob` implementations of globbing.

No change to the implementations of `glob.glob()` and `glob.iglob()`.
2024-04-10 20:43:07 +01:00
Barney Gale fc8007ee36
GH-117337: Deprecate `glob.glob0()` and `glob.glob1()`. (#117371)
These undocumented functions are no longer used by `msilib`, so there's no
reason to keep them around.
2024-04-01 19:37:41 +00:00
Serhiy Storchaka 5a78f6e798
gh-117134: Microoptimize glob() for include_hidden=True (GH-117135) 2024-03-22 20:03:48 +02:00
Barney Gale 0634201f53
GH-116377: Stop raising `ValueError` from `glob.translate()`. (#116378)
Stop raising `ValueError` from `glob.translate()` when a `**` sub-string
appears in a non-recursive pattern segment. This matches `glob.glob()`
behaviour.
2024-03-17 17:09:35 +00:00
Serhiy Storchaka aeffc7f895
gh-79382: Fix recursive glob() with trailing "**" (GH-115134)
Trailing "**" no longer allows to match files and non-existing paths in
recursive glob().
2024-02-11 12:24:13 +02:00
Barney Gale cf67ebfb31
GH-72904: Add `glob.translate()` function (#106703)
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`.

This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment.

In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code.

Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2023-11-13 17:15:56 +00:00
andrei kulakov ae36cd1e79
bpo-37578: glob.glob -- added include_hidden parameter (GH-30153)
Automerge-Triggered-By: GH:asvetlov
2021-12-18 06:23:34 -08:00
Serhiy Storchaka 5c7940257e
bpo-44482: Fix very unlikely resource leak in glob in non-CPython implementations (GH-26843) 2021-06-23 12:53:37 +03:00
Saiyang Gou a32f8fe713
bpo-43756: Add new audit event for new arguments added to glob.glob (GH-25239) 2021-04-21 23:42:55 +01:00
Serhiy Storchaka 1d3469988e
bpo-38144: Re-add accidentally removed audition for glob. (GH-22805) 2020-10-20 19:45:38 +03:00
Serhiy Storchaka 8a64ceaf98
bpo-38144: Add the root_dir and dir_fd parameters in glob.glob(). (GH-16075) 2020-06-18 22:08:27 +03:00
Serhiy Storchaka 54b4f14712
bpo-38149: Call sys.audit() only once per call for glob.glob(). (GH-18360) 2020-02-06 10:26:37 +02:00
Steve Dower 60419a7e96
bpo-37363: Add audit events for a range of modules (GH-14301) 2019-06-24 08:42:54 -07:00
Serhiy Storchaka 3ae41554c6 Issue #27998: Removed workarounds for supporting bytes paths on Windows in
os.walk() function and glob module since os.scandir() now directly supports
them.
2016-10-05 23:17:10 +03:00
Serhiy Storchaka c98b26a6ac Issue #25596: Falls back to listdir in glob for bytes paths on Windows. 2016-09-07 09:49:42 +03:00
Serhiy Storchaka 28ab634fa6 Issue #25596: Optimized glob() and iglob() functions in the
glob module; they are now about 3--6 times faster.
2016-09-06 22:33:41 +03:00
Serhiy Storchaka 04b5700b36 Issue #25584: Added "escape" to the __all__ list in the glob module.
From patch by Xavier de Gaye.
2015-11-09 23:18:19 +02:00
Serhiy Storchaka 735b790fed Issue #25584: Fixed recursive glob() with patterns starting with '**'. 2015-11-09 23:12:07 +02:00
Serhiy Storchaka c2edcdd194 Issue #13968: The glob module now supports recursive search in
subdirectories using the "**" pattern.
2014-09-11 12:17:37 +03:00
Serhiy Storchaka 6f20170762 Issue #17923: glob() patterns ending with a slash no longer match non-dirs on
AIX.  Based on patch by Delhallt.
2014-08-12 12:55:12 +03:00
Serhiy Storchaka fd32fffa5a Issue #8402: Added the escape() function to the glob module. 2013-11-18 13:06:43 +02:00
Petri Lehtinen 914ec6f718 Issue #16695: Document how glob handles filenames starting with a dot 2013-02-23 19:56:15 +01:00
Petri Lehtinen ee4a20bad6 Issue #16695: Document how glob handles filenames starting with a dot 2013-02-23 19:53:27 +01:00
Hynek Schlawack 6e5c8f992a #16618: Make glob.glob match consistently across strings and bytes
Fixes handling of leading dots.

Patch by Serhiy Storchaka.
2012-12-27 10:20:38 +01:00
Hynek Schlawack e26568f812 #16618: Make glob.glob match consistently across strings and bytes
Fixes handling of leading dots.

Patch by Serhiy Storchaka.
2012-12-27 10:10:11 +01:00
Andrew Svetlov ad28c7f9da Issue #16706: get rid of os.error 2012-12-18 22:02:39 +02:00
Antoine Pitrou feb318a37a Issue #16696: fix comparison between bytes and string. Also, improve glob tests. 2012-12-16 16:03:57 +01:00
Antoine Pitrou 5461558d1a Issue #16696: fix comparison between bytes and string. Also, improve glob tests. 2012-12-16 16:03:01 +01:00
Antoine Pitrou 39a6ee20ac Issue #16626: Fix infinite recursion in glob.glob() on Windows when the pattern contains a wildcard in the drive or UNC path.
Patch by Serhiy Storchaka.
2012-12-16 13:54:14 +01:00
Antoine Pitrou 3d068b2ecf Issue #16626: Fix infinite recursion in glob.glob() on Windows when the pattern contains a wildcard in the drive or UNC path.
Patch by Serhiy Storchaka.
2012-12-16 13:49:37 +01:00
Tim Golden 9b3fb0c6a0 Backed out changeset dafca4714298 2012-11-06 15:33:30 +00:00
Tim Golden 8f323d9aca issue9584: Add {} list expansion to glob. Original patch by Mathieu Bridon 2012-11-06 13:50:42 +00:00
Philip Jenvey 4993cc0a5b utilize yield from 2012-10-01 12:53:43 -07:00