Move pathlib's private `CopyReader`, `LocalCopyReader`, `CopyWriter` and
`LocalCopyWriter` classes into `pathlib._os`, where they can live alongside
the low-level copying functions (`copyfileobj()` etc) and high-level path
querying interface (`PathInfo`).
This sets the stage for merging `LocalCopyReader` into `PathInfo`.
No change of behaviour; just moving some code around.
In the private pathlib ABCs, make `ReadablePath.glob('')` yield a path with
a trailing slash (if it yields anything at all). As a result, `glob()`
works similarly to `joinpath()` when given a non-magic pattern.
In the globbing implementation, we preemptively add trailing slashes to
intermediate paths if there are pattern parts remaining; this removes the
need to check for existing trailing slashes (in the removed `add_slash()`
method) at subsequent steps.
Add `pathlib.Path.info` attribute, which stores an object implementing the `pathlib.types.PathInfo` protocol (also new). The object supports querying the file type and internally caching `os.stat()` results. Path objects generated by `Path.iterdir()` are initialised with status information from `os.DirEntry` objects, which is gleaned from scanning the parent directory.
The `PathInfo` protocol has four methods: `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.
Unlike `ReadablePath.[r]glob()` and `JoinablePath.full_match()`, the
`JoinablePath.match()` method doesn't support the recursive wildcard `**`,
and matches from the right when a fully relative pattern is given. These
quirks means its probably unsuitable for inclusion in the pathlib ABCs,
especially given `full_match()` handles the same use case.
In the private pathlib ABCs, support write-only virtual filesystems by
making `WritablePath` inherit directly from `JoinablePath`, rather than
subclassing `ReadablePath`.
There are two complications:
- `ReadablePath.open()` applies to both reading and writing
- `ReadablePath.copy` is secretly an object that supports the *read* side
of copying, whereas `WritablePath.copy` is a different kind of object
supporting the *write* side
We untangle these as follow:
- A new `pathlib._abc.magic_open()` function replaces the `open()` method,
which is dropped from the ABCs but remains in `pathlib.Path`. The
function works like `io.open()`, but additionally accepts objects with
`__open_rb__()` or `__open_wb__()` methods as appropriate for the mode.
These new dunders are made abstract methods of `ReadablePath` and
`WritablePath` respectively. If the pathlib ABCs are made public, we
could consider blessing an "openable" protocol and supporting it in
`io.open()`, removing the need for `pathlib._abc.magic_open()`.
- `ReadablePath.copy` becomes a true method, whereas `WritablePath.copy` is
deleted. A new `ReadablePath._copy_reader` property provides a
`CopyReader` object, and similarly `WritablePath._copy_writer` is a
`CopyWriter` object. Once GH-125413 is resolved, we'll be able to move
the `CopyReader` functionality into `ReadablePath.info` and eliminate
`ReadablePath._copy_reader`.
In the private pathlib ABCs, rename `PurePathBase` to `JoinablePath`, and
split `PathBase` into `ReadablePath` and `WritablePath`. This improves the
API fit for read-only virtual filesystems.
The split of `PathBase` entails a similar split of `CopyWorker` (implements
copying) and the test cases in `test_pathlib_abc`.
In a later patch, we'll make `WritablePath` inherit directly from
`JoinablePath` rather than `ReadablePath`. For a couple of reasons,
this isn't quite possible yet.
These methods combine `_delete()` and `copy()`, but `_delete()` isn't part
of the public interface, and it's unlikely to be added until the pathlib
ABCs are made official, or perhaps even later.
Remove `PurePathBase.relative_to()` and `is_relative_to()` because they
don't account for *other* being an entirely different kind of path, and
they can't use `__eq__()` because it's not on the `PurePathBase` interface.
Remove `PurePathBase.drive`, `root`, `is_absolute()` and `as_posix()`.
These are all too specific to local filesystems.
Remove the `PathBase.stat()` method. Its use of the `os.stat_result` API,
with its 10 mandatory fields and low-level types, makes it an awkward fit
for virtual filesystems.
We'll look to add a `PathBase.info` attribute later - see GH-125413.
Move 9 private `PathBase` attributes and methods into a new `CopyWorker`
class. Change `PathBase.copy` from a method to a `CopyWorker` instance.
The methods remain private in the `CopyWorker` class. In future we might
make some/all of them public so that user subclasses of `PathBase` can
customize the copying process (in particular reading/writing of metadata,)
but we'd need to make `PathBase` public first.
From `PurePathBase` delete `_globber`, `_stack` and `_pattern_str`, and
from `PathBase` delete `_glob_selector`. This helps avoid an unpleasant
surprise for a users who try to use these names.
Remove the `PurePathBase` initializer, and make `with_segments()` and
`__str__()` abstract. This allows us to drop the `_raw_paths` attribute,
and also the `Parser.join()` protocol method.
This method helped us customise the `UnsupportedOperation` message
depending on the type. But we're aiming to make `PathBase` a proper ABC
soon, so `NotImplementedError` is the right exception to raise there.
Remove the following methods from `pathlib._abc.PathBase`:
- `expanduser()`
- `hardlink_to()`
- `touch()`
- `chmod()`
- `lchmod()`
- `owner()`
- `group()`
- `from_uri()`
- `as_uri()`
These operations aren't regularly supported in virtual filesystems, so they
don't win a place in the `PathBase` interface. (Some of them probably don't
deserve a place in `Path` :P.) They're quasi-abstract (except `lchmod()`),
and they're not called by other `PathBase` methods.
Remove `PathBase.samefile()`, which is fairly specific to the local FS, and
relies on `stat()`, which we're aiming to remove from `PathBase`.
Also remove `PathBase.is_mount()`, `is_junction()`, `is_block_device()`,
`is_char_device()`, `is_fifo()` and `is_socket()`. These rely on POSIX
file type numbers that we're aiming to remove from the `PathBase` API.
Virtual filesystems don't always make a distinction between deleting files
and empty directories, and sometimes support deleting non-empty directories
in a single operation. Here we remove `PathBase.unlink()` and `rmdir()`,
leaving `_delete()` as the sole deletion method, now made abstract. I hope
to drop the underscore prefix later on.
Remove documentation for `pathlib.Path.scandir()`, and rename the method to
`_scandir()`. In the private pathlib ABCs, make `iterdir()` abstract and
call it from `_scandir()`.
It's not worthwhile to add this method at the moment - see discussion:
https://discuss.python.org/t/ergonomics-of-new-pathlib-path-scandir/71721
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
These classmethods presume that the user has retained the original
`__init__()` signature, which may not be the case. Also, many virtual
filesystems don't provide current or home directories.
Defer joining of path segments in the private `PurePathBase` ABC. The new
behaviour matches how the public `PurePath` class handles path segments.
This removes a hard-to-grok difference between the ABCs and the main
classes. It also slightly reduces the size of `PurePath` objects by
eliminating a `_raw_path` slot.
The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.
In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.
Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
Add `pathlib.Path.scandir()` as a trivial wrapper of `os.scandir()`. This
will be used to implement several `PathBase` methods more efficiently,
including methods that provide `Path.copy()`.
`PurePath.__init__()` incorrectly uses the `_raw_paths` of a given
`PurePath` object with a different flavour, even though the procedure to
join path segments can differ between flavours.
This change makes the `_raw_paths`-enabled deferred joining apply _only_
when the path flavours match.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Per feedback from Paul Moore on GH-123158, it's better to defer making
`Path.delete()` public than ship it with under-designed error handling
capabilities.
We leave a remnant `_delete()` method, which is used by `move()`. Any
functionality not needed by `move()` is deleted.
Rename `pathlib.Path.copy()` to `_copy_file()` (i.e. make it private.)
Rename `pathlib.Path.copytree()` to `copy()`, and add support for copying
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `delete()` methods (which will also
accept any type of file.)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Rename `pathlib.Path.rmtree()` to `delete()`, and add support for deleting
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `copy()` methods (which will also
accept any type of file.)
Add a `Path.rmtree()` method that removes an entire directory tree, like
`shutil.rmtree()`. The signature of the optional *on_error* argument
matches the `Path.walk()` argument of the same name, but differs from the
*onexc* and *onerror* arguments to `shutil.rmtree()`. Consistency within
pathlib is probably more important.
In the private pathlib ABCs, we add an implementation based on `walk()`.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Add *preserve_metadata* keyword-only argument to `pathlib.Path.copy()`, defaulting to false. When set to true, we copy timestamps, permissions, extended attributes and flags where available, like `shutil.copystat()`. The argument has no effect on Windows, where metadata is always copied.
Internally (in the pathlib ABCs), path types gain `_readable_metadata` and `_writable_metadata` attributes. These sets of strings describe what kinds of metadata can be retrieved and stored. We take an intersection of `source._readable_metadata` and `target._writable_metadata` to minimise reads/writes. A new `_read_metadata()` method accepts a set of metadata keys and returns a dict with those keys, and a new `_write_metadata()` method accepts a dict of metadata. We *might* make these public in future, but it's hard to justify while the ABCs are still private.
Check for `ERROR_INVALID_PARAMETER` when calling `_winapi.CopyFile2()` and
raise `UnsupportedOperation`. In `Path.copy()`, handle this exception and
fall back to the `PathBase.copy()` implementation.
Add support for not following symlinks in `pathlib.Path.copy()`.
On Windows we add the `COPY_FILE_COPY_SYMLINK` flag is following symlinks is disabled. If the source is symlink to a directory, this call will fail with `ERROR_ACCESS_DENIED`. In this case we add `COPY_FILE_DIRECTORY` to the flags and retry. This can fail on old Windowses, which we note in the docs.
No news as `copy()` was only just added.
Add a `Path.copy()` method that copies the content of one file to another.
This method is similar to `shutil.copyfile()` but differs in the following ways:
- Uses `fcntl.FICLONE` where available (see GH-81338)
- Uses `os.copy_file_range` where available (see GH-81340)
- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.
The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.
Incorporates code from GH-81338 and GH-93152.
Co-authored-by: Eryk Sun <eryksun@gmail.com>
For silly reasons, pathlib's generic implementation of `walk()` currently
resides in `glob._Globber`. This commit moves it into
`pathlib._abc.PathBase.walk()` where it really belongs, and makes
`pathlib.Path.walk()` call `os.walk()`.
pathlib now treats "`.`" as a valid file extension (suffix). This brings
it in line with `os.path.splitext()`.
In the (private) pathlib ABCs, we add a new `ParserBase.splitext()` method
that splits a path into a `(root, ext)` pair, like `os.path.splitext()`.
This method is called by `PurePathBase.stem`, `suffix`, etc. In a future
version of pathlib, we might make these base classes public, and so users
will be able to define their own `splitext()` method to control file
extension splitting.
In `pathlib.PurePath` we add optimised `stem`, `suffix` and `suffixes`
properties that don't use `splitext()`, which avoids computing the path
base name twice.
Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_*()`
rather than a selection of more common errors as we do presently. Also
adjust the implementations to call `os.path.exists()` etc, which are much
faster on Windows thanks to GH-101196.
Remove support for supplying additional positional arguments to
`PurePath.relative_to()` and `is_relative_to()`. This has been deprecated
since Python 3.12.