Add optional *add_scheme* argument to `urllib.request.pathname2url()`; when
set to true, a complete URL is returned. Likewise add optional
*require_scheme* argument to `url2pathname()`; when set to true, a complete
URL is accepted.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
In `urllib.request.url2pathname()`, if the authority resolves to the
current host, discard it. If an authority is present but resolves somewhere
else, then on Windows we return a UNC path (as before), and on other
platforms we raise `URLError`.
Affects `pathlib.Path.from_uri()` in the same way.
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
The canonical `file:` URL (as generated by `pathname2url()`) is now used as the `url` attribute of the returned `addinfourl` object. The `addinfourl.url` attribute reflects the resolved URL for both `file:` or `http[s]:` URLs now.
When handed an absolute Windows path such as `C:\foo` or `//server/share`,
the `urllib.request.pathname2url()` function returns a URL with an
authority section, such as `///C:/foo` or `//server/share` (or before
GH-126205, `////server/share`). Only the `file:` prefix is omitted.
But when handed an absolute POSIX path such as `/etc/hosts`, or a Windows
path of the same form (rooted but lacking a drive), the function returns a
URL without an authority section, such as `/etc/hosts`.
This patch corrects the discrepancy by adding a `//` prefix before
drive-less, rooted paths when generating URLs.
Stop converting Windows drive letters to uppercase in
`urllib.request.pathname2url()` and `url2pathname()`. This behaviour is
unnecessary and inconsistent with pathlib's file URI implementation.
Decode a file URI like `file://///server/share` as a UNC path like
`\\server\share`. This form of file URI is created by software the simply
prepends `file:///` to any absolute Windows path.
Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
Adjust `pathname2url()` to encode embedded colon characters in Windows
paths, rather than bailing out with an `OSError`.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Adjust `urllib.request.url2pathname()` and `pathname2url()` to use the
filesystem encoding when quoting and unquoting file URIs, rather than
forcing use of UTF-8.
No changes are needed in the `nturl2path` module because Windows always
uses UTF-8, per PEP 529.
Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
Adjust `urllib.request.pathname2url()` and `url2pathname()` so that they
don't remove slashes from Windows DOS drive paths and URLs. There was no
basis for this behaviour, and it conflicts with how UNC and POSIX paths are
handled.
Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test
only the Windows-specific implementations from `nturl2path`) into the main
`Pathname_Tests` test case for these functions.
Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`.
`urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram.
This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations.
Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"*1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected.
Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile.
Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'` as a test.
- WASI's ``gethostname()`` is a stub that always fails with OSError
``ENOTSUP``
- skip mailcap ``test`` if subprocess is not available
- WASI process_time clock does not work.
Add methods enterContext() and enterClassContext() in TestCase.
Add method enterAsyncContext() in IsolatedAsyncioTestCase.
Add function enterModuleContext().
test_urllib commented since 2007:
commit d9880d07fc
Author: Facundo Batista <facundobatista@gmail.com>
Date: Fri May 25 04:20:22 2007 +0000
Commenting out the tests until find out who can test them in
one of the problematic enviroments.
pynche code commented since 1998 and 2001:
commit ef30092207
Author: Barry Warsaw <barry@python.org>
Date: Tue Dec 15 01:04:38 1998 +0000
Added most of the mechanism to change the strips from color variations
to color constants (i.e. red constant, green constant, blue
constant). But I haven't hooked this up yet because the UI gets more
crowded and the arrows don't reflect the correct values.
Added "Go to Black" and "Go to White" buttons.
commit 741eae0b31
Author: Barry Warsaw <barry@python.org>
Date: Wed Apr 18 03:51:55 2001 +0000
StripWidget.__init__(), update_yourself(): Removed some unused local
variables reported by PyChecker.
__togglegentype(): PyChecker accurately reported that the variable
__gentypevar was unused -- actually this whole method is currently
unused so comment it out.
urllib.request tests now call urlcleanup() to remove temporary files
created by urlretrieve() tests and to clear the _opener global
variable set by urlopen() and functions calling indirectly urlopen().
regrtest now checks if urllib.request._url_tempfiles and
urllib.request._opener are changed by tests.
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.
Co-Authored-By: SH <push0ebp@gmail.com>
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected.
* bpo-16285: Update urllib quoting to RFC 3986
urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.
Patch by Christian Theune and Ratnadeep Debnath.
In urllib.request, suffixes in no_proxy environment variable with
leading dots could match related hostnames again (e.g. .b.c matches a.b.c).
Patch by Milan Oberkirch.
The deprecation include manual creation of SSLSocket and certfile/keyfile
(or similar) in ftplib, httplib, imaplib, smtplib, poplib and urllib.
ssl.wrap_socket() is not marked as deprecated yet.
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.
Issue #27568 Reported and patch contributed by Rémi Rampin.