Commit Graph

82 Commits

Author SHA1 Message Date
Serhiy Storchaka 14305a83d3
gh-133677: Fix tests when running in non-UTF-8 locale (GH-133865) 2025-05-12 19:09:11 +03:00
Daniel Hollas cae660d6dc
gh-118761: Add test_lazy_import for more modules (#133057)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2025-05-05 22:46:05 +00:00
Barney Gale 8e08ac9f32
GH-123599: `url2pathname()`: don't call `gethostbyname()` by default (#132610)
Follow-up to 66cdb2bd8a.

Add *resolve_host* keyword-only argument to `url2pathname()`, defaulting to
false. When set to true, we call `socket.gethostbyname()` to resolve the
URL hostname.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2025-05-05 17:03:42 +00:00
Barney Gale ccad61e35d
GH-125866: Support complete "file:" URLs in urllib (#132378)
Add optional *add_scheme* argument to `urllib.request.pathname2url()`; when
set to true, a complete URL is returned. Likewise add optional
*require_scheme* argument to `url2pathname()`; when set to true, a complete
URL is accepted.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-04-14 01:49:02 +01:00
Karolina Surma 3e1a47bdb4
gh-132356: Find the correct group name in test_group_no_follow_symlinks (#132357)
Find the correct group name in test_group_no_follow_symlinks
2025-04-11 15:58:39 +01:00
Barney Gale 66cdb2bd8a
GH-123599: `url2pathname()`: handle authority section in file URL (#126844)
In `urllib.request.url2pathname()`, if the authority resolves to the
current host, discard it. If an authority is present but resolves somewhere
else, then on Windows we return a UNC path (as before), and on other
platforms we raise `URLError`.

Affects `pathlib.Path.from_uri()` in the same way.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-04-10 19:58:04 +00:00
Barney Gale f141e8ec2a
GH-123599: Deprecate duplicate `pathname2url()` implementation (#127380)
Call `urllib.request.pathname2url()` from `pathlib.Path.as_uri()`, and
deprecate the duplicate implementation in `PurePath`.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2025-03-20 00:54:36 +00:00
Bénédikt Tran 3185e3115c
gh-131277: allow `EnvironmentVarGuard` to unset more than one environment variable at once (#131280)
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
2025-03-16 14:09:33 +01:00
Barney Gale 633942e348
GH-130614: pathlib ABCs: delete vestigial `test_pathlib_abc` module (#131215)
Remove the `test.test_pathlib.test_pathlib_abc` test module, which was
hollowed out in previous commits. Its few remaining tests are most relevant
to `PurePath` and `Path`, so we move them into `test_pathlib`.
2025-03-14 20:04:07 +00:00
Victor Stinner b2ca26875a
gh-131152: Remove unused imports from tests (part 2) (#131154) 2025-03-13 10:57:40 +01:00
Barney Gale ad90c5fabc
GH-130614: pathlib ABCs: revise test suite for readable paths (#131018)
Test `pathlib.types._ReadablePath` in a dedicated test module. These tests
cover `ReadableZipPath`, `ReadableLocalPath` and `Path`, where the former
two classes are implementations of `_ReadablePath` for use in tests.
2025-03-11 20:54:22 +00:00
Barney Gale 3569e4a670
GH-130614: pathlib ABCs: revise test suite for Windows path joining (#131016)
Test Windows-flavoured `pathlib.types._JoinablePath` in a dedicated test
module. These tests cover `LexicalWindowsPath`, `PureWindowsPath` and
`WindowsPath`, where `LexicalWindowsPath` is a simple implementation of
`_JoinablePath` for use in tests.
2025-03-11 18:01:43 +00:00
Barney Gale 93fc3d34f9
GH-127381: pathlib ABCs: remove `case_sensitive` argument (#131024)
Remove the *case_sensitive* argument from `_JoinablePath.full_match()` and
`_ReadablePath.glob()`. Using a non-native case sensitivity forces the use
of "case-pedantic" globbing, where we `iterdir()` even for non-wildcard
pattern segments. But it's hard to know when to enable this mode, as
case-sensitivity can vary by directory, so `_PathParser.normcase()` doesn't
always give the full picture. The `Path.glob()` implementation is forced to
make an educated guess, but we can avoid the issue in the ABCs by dropping
the *case_sensitive* argument.

(I probably shouldn't have added these arguments in `PurePath` and `Path`
in the first place!)

Also drop support for `_ReadablePath.glob(recurse_symlinks=False)`, which
makes recursive globbing much slower.
2025-03-10 17:50:48 +00:00
Barney Gale d0eb01c9de
GH-128520: Merge `pathlib._abc` into `pathlib.types` (#130747)
There used to be a meaningful distinction between these modules: `pathlib`
imported `pathlib._abc` but not `pathlib.types`. This is no longer the
case (neither module is imported), so we move the ABCs as follows:

- `pathlib._abc.JoinablePath` --> `pathlib.types._JoinablePath`
- `pathlib._abc.ReadablePath` --> `pathlib.types._ReadablePath`
- `pathlib._abc.WritablePath` --> `pathlib.types._WritablePath`
2025-03-03 17:56:57 +00:00
Barney Gale a55dffd66d
GH-127381: pathlib ABCs: remove `ReadablePath.exists()` and `is_*()` (#130520)
Remove `ReadablePath` methods duplicated by `ReadablePath.info`. To be
specific, we remove `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.

The public `Path` class retains these methods.
2025-03-01 21:24:19 +00:00
Barney Gale 6f07016bf0
GH-127381: pathlib ABCs: remove `ReadablePath.rglob()` (#130207)
Remove `ReadablePath.rglob()` from the private pathlib ABCs. This method is
a trivial wrapper around `glob()` and easily replaced.
2025-02-17 19:15:59 +00:00
Barney Gale 7fcace99bb
GH-125413: Add private metadata methods to `pathlib.Path.info` (#129897)
Add the following private methods to `pathlib.Path.info`:

- `_posix_permissions()`: the POSIX file permissions (`S_IMODE(st_mode)`)
- `_file_id()`: the file ID (`(st_dev, st_ino)`)
- `_access_time_ns()`: the access time in nanoseconds (`st_atime_ns`)
- `_mod_time_ns()`: the modify time in nanoseconds (`st_mtime_ns`)
- `_bsd_flags()`: the BSD file flags (`st_flags`)
- `_xattrs()`: the file extended attributes as a list of key, value pairs,
  or an empty list if `listxattr()` or `getxattr()` fail in an ignorable 
  way.

These methods replace `LocalCopyReader.read_metadata()`, and so we can
delete the `CopyReader` and `LocalCopyReader` classes. Rather than reading
metadata via `source._copy_reader.read_metadata()`, we instead call
`source.info._posix_permissions()`, `_access_time_ns()`, etc.

Preserving metadata is only supported for local-to-local copies at the
moment. To support copying metadata between arbitrary `ReadablePath` and
`WritablePath` objects, we'd need to make the new methods public and
documented.

Co-authored-by: Petr Viktorin <encukou@gmail.com>
2025-02-17 19:15:25 +00:00
Barney Gale a7d41a8947
GH-128520: Subclass `abc.ABC` in `pathlib._abc` (#128745)
Convert `JoinablePath`, `ReadablePath` and `WritablePath` to real ABCs
derived from `abc.ABC`.

Make `JoinablePath.parser` abstract, rather than defaulting to `posixpath`.

Register `PurePath` and `Path` as virtual subclasses of the ABCs rather
than deriving. This avoids a hit to path object instantiation performance.

No change of behaviour in the public (non-abstract) classes.
2025-02-16 00:37:26 +00:00
Barney Gale 718ab66299
GH-125413: Add `pathlib.Path.info` attribute (#127730)
Add `pathlib.Path.info` attribute, which stores an object implementing the `pathlib.types.PathInfo` protocol (also new). The object supports querying the file type and internally caching `os.stat()` results. Path objects generated by `Path.iterdir()` are initialised with status information from `os.DirEntry` objects, which is gleaned from scanning the parent directory.

The `PathInfo` protocol has four methods: `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.
2025-02-08 01:16:45 +00:00
Barney Gale a4459c34ea
GH-127381: pathlib ABCs: remove `JoinablePath.match()` (#129147)
Unlike `ReadablePath.[r]glob()` and `JoinablePath.full_match()`, the
`JoinablePath.match()` method doesn't support the recursive wildcard `**`,
and matches from the right when a fully relative pattern is given. These
quirks means its probably unsuitable for inclusion in the pathlib ABCs,
especially given `full_match()` handles the same use case.
2025-01-28 20:22:55 +00:00
Barney Gale 01d91500ca
GH-128520: Make `pathlib._abc.WritablePath` a sibling of `ReadablePath` (#129014)
In the private pathlib ABCs, support write-only virtual filesystems by
making `WritablePath` inherit directly from `JoinablePath`, rather than
subclassing `ReadablePath`.

There are two complications:

- `ReadablePath.open()` applies to both reading and writing
- `ReadablePath.copy` is secretly an object that supports the *read* side
  of copying, whereas `WritablePath.copy` is a different kind of object
  supporting the *write* side

We untangle these as follow:

- A new `pathlib._abc.magic_open()` function replaces the `open()` method,
  which is dropped from the ABCs but remains in `pathlib.Path`. The
  function works like `io.open()`, but additionally accepts objects with
  `__open_rb__()` or `__open_wb__()` methods as appropriate for the mode.
  These new dunders are made abstract methods of `ReadablePath` and
  `WritablePath` respectively.  If the pathlib ABCs are made public, we
  could consider blessing an "openable" protocol and supporting it in
  `io.open()`, removing the need for `pathlib._abc.magic_open()`.
- `ReadablePath.copy` becomes a true method, whereas `WritablePath.copy` is
  deleted. A new `ReadablePath._copy_reader` property provides a
  `CopyReader` object, and similarly `WritablePath._copy_writer` is a
  `CopyWriter` object. Once GH-125413 is resolved, we'll be able to move
  the `CopyReader` functionality into `ReadablePath.info` and eliminate
  `ReadablePath._copy_reader`.
2025-01-21 18:35:37 +00:00
Barney Gale 22a442181d
GH-128520: Divide pathlib ABCs into three classes (#128523)
In the private pathlib ABCs, rename `PurePathBase` to `JoinablePath`, and
split `PathBase` into `ReadablePath` and `WritablePath`. This improves the
API fit for read-only virtual filesystems.

The split of `PathBase` entails a similar split of `CopyWorker` (implements
copying) and the test cases in `test_pathlib_abc`.

In a later patch, we'll make `WritablePath` inherit directly from
`JoinablePath` rather than `ReadablePath`. For a couple of reasons,
this isn't quite possible yet.
2025-01-11 19:27:47 +00:00
Barney Gale fd94c6a803
pathlib tests: create `walk()` test hierarchy without using class under test (#128338)
In the tests for `pathlib.Path.walk()`, avoid using the path class under
test (`self.cls`) in test setup. Instead we use `os` functions in
`test_pathlib`, and direct manipulation of `DummyPath` internal data in
`test_pathlib_abc`.
2025-01-04 15:45:24 +00:00
Barney Gale 95352dcb93
GH-127381: pathlib ABCs: remove `PathBase.move()` and `move_into()` (#128337)
These methods combine `_delete()` and `copy()`, but `_delete()` isn't part
of the public interface, and it's unlikely to be added until the pathlib
ABCs are made official, or perhaps even later.
2025-01-04 12:53:51 +00:00
Barney Gale ef63cca494
GH-127381: pathlib ABCs: remove uncommon `PurePathBase` methods (#127853)
Remove `PurePathBase.relative_to()` and `is_relative_to()` because they
don't account for *other* being an entirely different kind of path, and
they can't use `__eq__()` because it's not on the `PurePathBase` interface.

Remove `PurePathBase.drive`, `root`, `is_absolute()` and `as_posix()`.
These are all too specific to local filesystems.
2024-12-29 22:07:12 +00:00
Barney Gale c78729f2df
GH-127381: pathlib ABCs: remove `PathBase.stat()` (#128334)
Remove the `PathBase.stat()` method. Its use of the `os.stat_result` API,
with its 10 mandatory fields and low-level types, makes it an awkward fit
for virtual filesystems.

We'll look to add a `PathBase.info` attribute later - see GH-125413.
2024-12-29 21:42:07 +00:00
Barney Gale d61542b5ff
pathlib tests: create test hierarchy without using class under test (#128200)
In the pathlib tests, avoid using the path class under test (`self.cls`) in
test setup. Instead we use `os` functions in `test_pathlib`, and direct
manipulation of `DummyPath` internal data in `test_pathlib_abc`.
2024-12-23 17:22:15 +00:00
Barney Gale a959ea1b0a
GH-127807: pathlib ABCs: remove `PurePathBase._raw_paths` (#127883)
Remove the `PurePathBase` initializer, and make `with_segments()` and
`__str__()` abstract. This allows us to drop the `_raw_paths` attribute,
and also the `Parser.join()` protocol method.
2024-12-22 01:17:59 +00:00
Hood Chatham 1183e4ce2f
gh-127146: Emscripten clean up test suite (#127984)
Removed test skips that are no longer required as a result of Emscripten updates.
2024-12-17 07:48:23 +00:00
Barney Gale 7146f18946
GH-127807: pathlib ABCs: remove `PathBase._unsupported_msg()` (#127855)
This method helped us customise the `UnsupportedOperation` message
depending on the type. But we're aiming to make `PathBase` a proper ABC
soon, so `NotImplementedError` is the right exception to raise there.
2024-12-12 17:39:24 +00:00
Barney Gale 12b4f1a5a1
GH-127381: pathlib ABCs: remove `PathBase.samefile()` and rarer `is_*()` (#127709)
Remove `PathBase.samefile()`, which is fairly specific to the local FS, and
relies on `stat()`, which we're aiming to remove from `PathBase`.

Also remove `PathBase.is_mount()`, `is_junction()`, `is_block_device()`,
`is_char_device()`, `is_fifo()` and `is_socket()`. These rely on POSIX
file type numbers that we're aiming to remove from the `PathBase` API.
2024-12-11 00:09:55 +00:00
Barney Gale 7f8ec52302
GH-127381: pathlib ABCs: remove `PathBase.unlink()` and `rmdir()` (#127736)
Virtual filesystems don't always make a distinction between deleting files
and empty directories, and sometimes support deleting non-empty directories
in a single operation. Here we remove `PathBase.unlink()` and `rmdir()`,
leaving `_delete()` as the sole deletion method, now made abstract. I hope
to drop the underscore prefix later on.
2024-12-08 18:45:09 +00:00
Barney Gale 31c9f3ced2
GH-127381: pathlib ABCs: remove `PathBase.resolve()` and `absolute()` (#127707)
Remove our implementation of POSIX path resolution in `PathBase.resolve()`.
This functionality is rather fragile and isn't necessary in most cases. It
depends on `PathBase.stat()`, which we're looking to remove.

Also remove `PathBase.absolute()`. Many legitimate virtual filesystems lack
the notion of a 'current directory', so it's wrong to include in the basic
interface.
2024-12-06 21:39:45 +00:00
Barney Gale 38264a060a
GH-127381: pathlib ABCs: remove `PathBase.lstat()` (#127382)
Remove the `PathBase.lstat()` method, which is a trivial variation of
`stat()`.

No user-facing changes because the pathlib ABCs are still private.
2024-11-29 21:03:39 +00:00
Barney Gale 4ea71278ca
pathlib tests: move `walk()` tests into their own classes (GH-126651)
Move tests for Path.walk() into a new PathWalkTest class, and apply a similar change in tests for the ABCs. This allows us to properly tear down the walk test hierarchy in tearDown(), rather than leaving it to os_helper.rmtree().
2024-11-23 18:31:00 -08:00
Barney Gale 266328552e
pathlib ABCs: tighten up `resolve()` and `absolute()` (#126611)
In `PathBase.resolve()`, raise `UnsupportedOperation` if a non-POSIX path
parser is used (our implementation uses `posixpath._realpath()`, which
produces incorrect results for non-POSIX path flavours.) Also tweak code to
call `self.absolute()` upfront rather than supplying an emulated `getcwd()`
function.

Adjust `PathBase.absolute()` to work somewhat like `resolve()`. If a POSIX
path parser is used, we treat the root directory as the current directory.
This is the simplest useful behaviour for concrete path types without a
current directory cursor.
2024-11-09 18:47:49 +00:00
Barney Gale cb8e5995d8
GH-125069: Fix inconsistent joining in `WindowsPath(PosixPath(...))` (#125156)
`PurePath.__init__()` incorrectly uses the `_raw_paths` of a given
`PurePath` object with a different flavour, even though the procedure to
join path segments can differ between flavours.

This change makes the `_raw_paths`-enabled deferred joining apply _only_
when the path flavours match.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-10-13 17:46:10 +00:00
Barney Gale 5002f17794
GH-119518: Stop interning strings in pathlib GH-123356)
Remove `sys.intern(str(x))` calls when normalizing a path in pathlib. This
speeds up `str(Path('foo/bar'))` by about 10%.
2024-09-02 18:14:09 +02:00
Barney Gale 033d537cd4
GH-73991: Make `pathlib.Path.delete()` private. (#123315)
Per feedback from Paul Moore on GH-123158, it's better to defer making
`Path.delete()` public than ship it with under-designed error handling
capabilities.

We leave a remnant `_delete()` method, which is used by `move()`. Any
functionality not needed by `move()` is deleted.
2024-08-26 16:26:34 +01:00
Barney Gale c68a93c582
GH-73991: Add `pathlib.Path.copy_into()` and `move_into()` (#123314)
These two methods accept an *existing* directory path, onto which we join
the source path's base name to form the final target path.

A possible alternative implementation is to check for directories in
`copy()` and `move()` and adjust the target path, which is done in several
`shutil` functions. This behaviour is helpful in a shell context, but
less so in a stored program that explicitly specifies destinations. For
example, a user that calls `Path('foo.py').copy('bar.py')` might not
imagine that `bar.py/foo.py` would be created, but under the alternative
implementation this will happen if `bar.py` is an existing directory.
2024-08-26 14:14:23 +01:00
Barney Gale 625d0705b9
GH-73991: Add `pathlib.Path.move()` (#122073)
Add a `Path.move()` method that moves a file or directory tree, and returns a new `Path` instance pointing to the target.

This method is similar to `shutil.move()`, except that it doesn't accept a *copy_function* argument, and it doesn't check whether the destination is an existing directory.
2024-08-25 16:51:51 +01:00
Barney Gale c4ee4e756a
GH-122890: Fix low-level error handling in `pathlib.Path.copy()` (#122897)
Give unique names to our low-level FD copying functions, and try each one
in turn. Handle errors appropriately for each implementation:

- `fcntl.FICLONE`: suppress `EBADF`, `EOPNOTSUPP`, `ETXTBSY`, `EXDEV`
- `posix._fcopyfile`: suppress `EBADF`, `ENOTSUP`
- `os.copy_file_range`: suppress `ETXTBSY`, `EXDEV`
- `os.sendfile`: suppress `ENOTSOCK`
2024-08-24 15:11:39 +01:00
Barney Gale 5f68511522
GH-85633: Fix pathlib test failures on filesystems without world-write. (#122883)
Replace `umask(0)` with `umask(0o002)` so the created files are not
world-writable, and replace `umask(0o022)` with `umask(0o026)` to check
that permissions for 'others' can still be set.
2024-08-13 17:09:21 +00:00
Barney Gale a6644d4464
GH-73991: Rework `pathlib.Path.copytree()` into `copy()` (#122369)
Rename `pathlib.Path.copy()` to `_copy_file()` (i.e. make it private.)

Rename `pathlib.Path.copytree()` to `copy()`, and add support for copying
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `delete()` methods (which will also
accept any type of file.)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2024-08-11 22:43:18 +01:00
Barney Gale 98dba73010
GH-73991: Rework `pathlib.Path.rmtree()` into `delete()` (#122368)
Rename `pathlib.Path.rmtree()` to `delete()`, and add support for deleting
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `copy()` methods (which will also
accept any type of file.)
2024-08-07 01:34:44 +01:00
Kirill Podoprigora 5901d92739
gh-122096: Remove accidentally left debugging prints (#122097) 2024-07-21 20:48:39 +01:00
Barney Gale c4c7097e64
GH-73991: Support preserving metadata in `pathlib.Path.copytree()` (#121438)
Add *preserve_metadata* keyword-only argument to `pathlib.Path.copytree()`,
defaulting to false. When set to true, we copy timestamps, permissions,
extended attributes and flags where available, like `shutil.copystat()`.
2024-07-20 23:32:52 +01:00
Barney Gale 094375b9b7
GH-73991: Add `pathlib.Path.rmtree()` (#119060)
Add a `Path.rmtree()` method that removes an entire directory tree, like
`shutil.rmtree()`. The signature of the optional *on_error* argument
matches the `Path.walk()` argument of the same name, but differs from the
*onexc* and *onerror* arguments to `shutil.rmtree()`. Consistency within
pathlib is probably more important.

In the private pathlib ABCs, we add an implementation based on `walk()`.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-07-20 20:14:13 +00:00
Barney Gale b765e4adf8
GH-73991: Fix "Operation not supported" on Fedora buildbot. (#121444)
Follow-up to #120806. Use `os_helper.skip_unless_xattr` to skip testing
xattr preservation when unsupported.
2024-07-07 17:27:52 +01:00
Barney Gale 88fc0655d4
GH-73991: Support preserving metadata in `pathlib.Path.copy()` (#120806)
Add *preserve_metadata* keyword-only argument to `pathlib.Path.copy()`, defaulting to false. When set to true, we copy timestamps, permissions, extended attributes and flags where available, like `shutil.copystat()`. The argument has no effect on Windows, where metadata is always copied.

Internally (in the pathlib ABCs), path types gain `_readable_metadata` and `_writable_metadata` attributes. These sets of strings describe what kinds of metadata can be retrieved and stored. We take an intersection of `source._readable_metadata` and `target._writable_metadata` to minimise reads/writes. A new `_read_metadata()` method accepts a set of metadata keys and returns a dict with those keys, and a new `_write_metadata()` method accepts a dict of metadata. We *might* make these public in future, but it's hard to justify while the ABCs are still private.
2024-07-06 17:18:39 +01:00