This method supports file URIs (including variants) as described in RFC 8089, such as URIs generated by `pathlib.Path.as_uri()` and `urllib.request.pathname2url()`.
The method is added to `Path` rather than `PurePath` because it uses `os.fsdecode()`, and so its results vary from system to system. I intend to deprecate `PurePath.as_uri()` and move it to `Path` for the same reason.
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Brings `pathlib.Path.is_dir()` and `in line with `os.DirEntry.is_dir()`, which
will be important for implementing generic path walking and globbing.
Likewise `is_file()`.
This new exception type is raised instead of `NotImplementedError` when
a path operation is not supported. It can be raised from `Path.readlink()`,
`symlink_to()`, `hardlink_to()`, `owner()` and `group()`. In a future
version of pathlib, it will be raised by `AbstractPath` for these methods
and others, such as `AbstractPath.mkdir()` and `unlink()`.
This commit introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal `**` wildcard, such as `**/*.py`. For this example, the previous implementation recursively walked directories using `os.scandir()` when it expanded the `**` component, and then **scanned those same directories again** when expanded the `*.py` component. This is wasteful.
In the new implementation, any components following a `**` wildcard are used to build a `re.Pattern` object, which is used to filter the results of the recursive walk. A pattern like `**/*.py` uses half the number of `os.scandir()` calls; a pattern like `**/*/*.py` a third, etc.
This new algorithm does not apply if either:
1. The *follow_symlinks* argument is set to `None` (its default), or
2. The pattern contains `..` components.
In these cases we fall back to the old implementation.
This commit also replaces selector classes with selector functions. These generators directly yield results rather calling through to their successors. A new internal `Path._glob()` method takes care to chain these generators together, which simplifies the lazy algorithm and slightly improves performance. It should also be easier to understand and maintain.
`PurePath.match()` now handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments.
We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and`rglob()`.
When *follow_symlinks* is `None` (the default), these methods follow symlinks except when evaluating "`**`" wildcards. When set to true or false, symlinks are always or never followed, respectively.
Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
This argument allows case-sensitive matching to be enabled on Windows, and
case-insensitive matching to be enabled on Posix.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized!
This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.
Automerge-Triggered-By: GH:AlexWaygood
The documentation for `rglob` did not mention what `pattern` actually
is.
Mentioning and linking to `fnmatch` makes this explicit, as the
documentation for `fnmatch` both shows the syntax and some explanation.
The behaviour is fully explained a couple paragraphs above, but it may be useful to have a brief example to cover the behaviour.
Automerge-Triggered-By: GH:hauntsaninja
By default, :meth:`pathlib.PurePath.relative_to` doesn't deal with paths that are not a direct prefix of the other, raising an exception in that instance. This change adds a *walk_up* parameter that can be set to allow for using ``..`` to calculate the relative path.
example:
```
>>> p = PurePosixPath('/etc/passwd')
>>> p.relative_to('/etc')
PurePosixPath('passwd')
>>> p.relative_to('/usr')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pathlib.py", line 940, in relative_to
raise ValueError(error_message.format(str(self), str(formatted)))
ValueError: '/etc/passwd' does not start with '/usr'
>>> p.relative_to('/usr', strict=False)
PurePosixPath('../etc/passwd')
```
https://bugs.python.org/issue40358
Automerge-Triggered-By: GH:brettcannon
Have `pathlib.WindowsPath.is_mount()` call `ntpath.ismount()`. Previously it raised `NotImplementedError` unconditionally.
https://bugs.python.org/issue42777
The r in `rglob` stands for "recursively", so use the word in the description. Also, glob and rglob can usefully be mentioned as the pathlib equivalent of os.walk.
Automerge-Triggered-By: GH:brettcannon
* Add additional pointers to pathlib's mapping to os.path functions
os.path.splitext has a somewhat quirky signature since it mixes the path and filename components but I wanted the documentation to mention `PurePath.stem` as the natural counterpart to `PurePath.suffix` for the common use of `os.path.splitext` to turn "file.py" into "file" and "py".
Technically this could have some discussion of how to handle the parent directory hierarchy but that seems a bit out of keeping with the spirit of this table so I omitted mentioning `PurePath.parents` here.
* Update Doc/library/pathlib.rst
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Documentation for `pathlib` says:
> Spurious slashes and single dots are collapsed, but double dots ('..') are not, since this would change the meaning of a path in the face of symbolic links:
However, it omits that initial double slashes also aren't collapsed.
Later, in documentation of `PurePath.drive`, `PurePath.root`, and `PurePath.name` it mentions UNC but:
- this abbreviation says nothing to a person who is unaware about existence of UNC (Wikipedia doesn't help either by [giving a disambiguation page](https://en.wikipedia.org/wiki/UNC))
- it shows up only if a person needs to use a specific property or decides to fully learn what the module provides.
For context, see the BPO entry.
These are currently broken as they refer to :meth:`Path.relative_to` rather than :meth:`PurePath.relative_to`, and `relative_to` is a method on `PurePath`.
We could try to remedy this by taking a slice, but we then run into an issue where the empty string will match altsep on POSIX. That rabbit hole could keep getting deeper.
A proper fix for the original issue involves making pathlib's path normalisation more configurable - in this case we want to retain trailing slashes, but in other we might want to preserve `./` prefixes, or elide `../` segments when we're sure we won't encounter symlinks.
This reverts commit ea2f5bcda1.
The argument order of `link_to()` is reversed compared to what one may expect, so:
a.link_to(b)
Might be expected to create *a* as a link to *b*, in fact it creates *b* as a link to *a*, making it function more like a "link from". This doesn't match `symlink_to()` nor the documentation and doesn't seem to be the original author's intent.
This PR deprecates `link_to()` and introduces `hardlink_to()`, which has the same argument order as `symlink_to()`.
This makes `ntpath.expanduser()` match `pathlib.Path.expanduser()` in this regard, and is more in line with `posixpath.expanduser()`'s cautious approach.
Also remove the near-duplicate implementation of `expanduser()` in pathlib, and by doing so fix a bug where KeyError could be raised when expanding another user's home directory.
This commit also fixes up some of the overlapping documentation changed
in bpo-35498, which added support for indexing with slices.
Fixes bpo-21041.
https://bugs.python.org/issue21041
Co-authored-by: Paul Ganssle <p.ganssle@gmail.com>
Co-authored-by: Rémi Lapeyre <remi.lapeyre@henki.fr>
Added slice support to the `pathlib.Path.parents` sequence. For a `Path` `p`, slices of `p.parents` should return the same thing as slices of `tuple(p.parents)`.
* Add _newline_ parameter to `pathlib.Path.write_text()`
* Update documentation of `pathlib.Path.write_text()`
* Add test case for `pathlib.Path.write_text()` calls with _newline_ parameter passed
Automerge-Triggered-By: GH:methane
Fixes Issue39285
The example incorrectly returned True for match.
Furthermore the example is ambiguous in its usage of PureWindowsPath.
Windows is case-insensitve, however the underlying match functionality
utilizes fnmatch.fnmatchcase.
Automerge-Triggered-By: @pitrou
This adds a "readlink" method to pathlib.Path objects that calls through
to os.readlink.
https://bugs.python.org/issue30618
Automerge-Triggered-By: @gpshead
Similarly to how several pathlib file creation functions have an "exists_ok" parameter, we should introduce "missing_ok" that makes removal functions not raise an exception when a file or directory is already absent. IMHO, this should cover Path.unlink and Path.rmdir. Note, Path.resolve() has a "strict" parameter since 3.6 that does the same thing. Naming this of this new parameter tries to be consistent with the "exists_ok" parameter as that is more explicit about what it does (as opposed to "strict").
https://bugs.python.org/issue33123
Such functions as os.path.exists(), os.path.lexists(), os.path.isdir(),
os.path.isfile(), os.path.islink(), and os.path.ismount() now return False
instead of raising ValueError or its subclasses UnicodeEncodeError
and UnicodeDecodeError for paths that contain characters or bytes
unrepresentative at the OS level.
This adds support both to pathlib.PurePath's constructor as well as
implementing __fspath__(). This removes the provisional status for
pathlib.
Initial patch by Dusty Phillips.
These added a path attribute to pathlib.Path objects, and docs.
Instead, we're going to use PEP 519.
(Starting in the 3.4 branch and merging forward from there since that's what I did originally.)