Commit Graph

207 Commits

Author SHA1 Message Date
Barney Gale f3192dac66
GH-90812: Add test for `urlopen()` of file URI for UNC path (#132489) 2025-04-15 19:16:34 +01:00
Serhiy Storchaka f98b9b4cbb
gh-71339: Use new assertion methods in the urllib tests (GH-129056) 2025-04-14 09:24:41 +03:00
Barney Gale ccad61e35d
GH-125866: Support complete "file:" URLs in urllib (#132378)
Add optional *add_scheme* argument to `urllib.request.pathname2url()`; when
set to true, a complete URL is returned. Likewise add optional
*require_scheme* argument to `url2pathname()`; when set to true, a complete
URL is accepted.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-04-14 01:49:02 +01:00
Barney Gale 66cdb2bd8a
GH-123599: `url2pathname()`: handle authority section in file URL (#126844)
In `urllib.request.url2pathname()`, if the authority resolves to the
current host, discard it. If an authority is present but resolves somewhere
else, then on Windows we return a UNC path (as before), and on other
platforms we raise `URLError`.

Affects `pathlib.Path.from_uri()` in the same way.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-04-10 19:58:04 +00:00
Barney Gale d783d7b51d
GH-126367: `url2pathname()`: handle NTFS alternate data streams (#131428)
Adjust `url2pathname()` to decode embedded colon characters in Windows
URIs, rather than bailing out with an `OSError`.
2025-03-18 23:37:12 +00:00
Serhiy Storchaka 5ace71713b
gh-128734: Fix ResourceWarning in urllib tests (GH-128735) 2025-01-12 12:53:17 +02:00
Barney Gale 79b7cab50a
GH-127090: Fix `urllib.response.addinfourl.url` value for opened `file:` URIs (#127091)
The canonical `file:` URL (as generated by `pathname2url()`) is now used as the `url` attribute of the returned `addinfourl` object. The `addinfourl.url` attribute reflects the resolved URL for both `file:` or `http[s]:` URLs now.
2024-12-07 17:58:42 +00:00
Barney Gale 5bb059fe60
GH-127236: `pathname2url()`: generate RFC 1738 URL for absolute POSIX path (#127194)
When handed an absolute Windows path such as `C:\foo` or `//server/share`,
the `urllib.request.pathname2url()` function returns a URL with an
authority section, such as `///C:/foo` or `//server/share` (or before
GH-126205, `////server/share`). Only the `file:` prefix is omitted.

But when handed an absolute POSIX path such as `/etc/hosts`, or a Windows
path of the same form (rooted but lacking a drive), the function returns a
URL without an authority section, such as `/etc/hosts`.

This patch corrects the discrepancy by adding a `//` prefix before
drive-less, rooted paths when generating URLs.
2024-11-25 19:59:20 +00:00
Serhiy Storchaka 97b2ceaaaf
gh-127217: Fix pathname2url() for paths starting with multiple slashes on Posix (GH-127218) 2024-11-24 19:30:29 +02:00
Barney Gale cc813e10ff
GH-125866: Preserve Windows drive letter case in file URIs (#127138)
Stop converting Windows drive letters to uppercase in
`urllib.request.pathname2url()` and `url2pathname()`. This behaviour is
unnecessary and inconsistent with pathlib's file URI implementation.
2024-11-23 10:41:39 +00:00
Barney Gale 8c98ed846a
GH-127078: `url2pathname()`: handle extra slash before UNC drive in URL path (#127132)
Decode a file URI like `file://///server/share` as a UNC path like
`\\server\share`. This form of file URI is created by software the simply
prepends `file:///` to any absolute Windows path.
2024-11-22 04:12:50 +00:00
Barney Gale ebf564a1d3
GH-126766: `url2pathname()`: handle 'localhost' authority (#127129)
Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
2024-11-22 03:17:06 +00:00
Barney Gale fd133d4f21
GH-126601: `pathname2url()`: handle NTFS alternate data streams (#126760)
Adjust `pathname2url()` to encode embedded colon characters in Windows
paths, rather than bailing out with an `OSError`.

Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2024-11-22 00:29:05 +00:00
Barney Gale c9b399fbdb
GH-85168: Use filesystem encoding when converting to/from `file` URIs (#126852)
Adjust `urllib.request.url2pathname()` and `pathname2url()` to use the
filesystem encoding when quoting and unquoting file URIs, rather than
forcing use of UTF-8.

No changes are needed in the `nturl2path` module because Windows always
uses UTF-8, per PEP 529.
2024-11-19 21:19:30 +00:00
Barney Gale 4d771977b1
GH-84850: Remove `urllib.request.URLopener` and `FancyURLopener` (#125739) 2024-11-19 16:01:49 +02:00
Barney Gale cae9d9d20f
GH-126766: `url2pathname()`: handle empty authority section. (#126767)
Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
2024-11-14 20:22:14 +00:00
Barney Gale bf224bd7ce
GH-120423: `pathname2url()`: handle forward slashes in Windows paths (#126593)
Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
2024-11-12 19:52:30 +00:00
Barney Gale 54c63a32d0
GH-126212: Fix removal of slashes in file URIs on Windows (#126214)
Adjust `urllib.request.pathname2url()` and `url2pathname()` so that they
don't remove slashes from Windows DOS drive paths and URLs. There was no
basis for this behaviour, and it conflicts with how UNC and POSIX paths are
handled.
2024-11-08 16:47:51 +00:00
Barney Gale 951cb2c369
GH-126205: Fix conversion of UNC paths to file URIs (#126208)
File URIs for Windows UNC paths should begin with two slashes, not four.
2024-10-30 22:56:58 +00:00
Barney Gale 6742f14dfd
GH-125866: Improve tests for `pathname2url()` and `url2pathname()` (#125993)
Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test
only the Windows-specific implementations from `nturl2path`) into the main
`Pathname_Tests` test case for these functions.

Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`.
2024-10-29 20:44:57 +00:00
Victor Stinner 2587b9f64e
gh-105382: Remove urllib.request cafile parameter (#105384)
Remove cafile, capath and cadefault parameters of the
urllib.request.urlopen() function, deprecated in Python 3.6.
2023-06-06 21:17:45 +00:00
Gregory P. Smith 2e279e85fe
gh-88500: Reduce memory use of `urllib.unquote` (#96763)
`urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram.

This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations.

Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"*1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal.  The slowdown scales consistently linear with input size as expected.

Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile.

Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'` as a test.
2022-12-10 16:17:39 -08:00
Christian Heimes 760ec8940a
gh-90473: WASI: skip gethostname tests (GH-93092)
- WASI's ``gethostname()`` is a stub that always fails with OSError
  ``ENOTSUP``
- skip mailcap ``test`` if subprocess is not available
- WASI process_time clock does not work.
2022-05-23 10:39:57 +02:00
Serhiy Storchaka 086c6b1b0f
bpo-45046: Support context managers in unittest (GH-28045)
Add methods enterContext() and enterClassContext() in TestCase.
Add method enterAsyncContext() in IsolatedAsyncioTestCase.
Add function enterModuleContext().
2022-05-08 17:49:09 +03:00
Steve Dower 3513d55a61
bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539) 2021-04-23 18:02:47 +01:00
Hai Shi 3ddc634cd5
bpo-40275: Use new test.support helper submodules in tests (GH-21219) 2020-06-30 15:46:06 +02:00
Serhiy Storchaka 700cfa8c90
bpo-41069: Make TESTFN and the CWD for tests containing non-ascii characters. (GH-21035) 2020-06-25 17:56:31 +03:00
Ashwin Ramaswami 9165addc22
bpo-38576: Disallow control characters in hostnames in http.client (GH-18995)
Add host validation for control characters for more CVE-2019-18348 protection.
2020-03-14 11:56:06 -07:00
Serhiy Storchaka 6a265f0d0c
bpo-39057: Fix urllib.request.proxy_bypass_environment(). (GH-17619)
Ignore leading dots and no longer ignore a trailing newline.
2020-01-05 14:14:31 +02:00
Victor Stinner ae7aa42774
Remove code commented for more than 10 years (GH-16965)
test_urllib commented since 2007:

commit d9880d07fc
Author: Facundo Batista <facundobatista@gmail.com>
Date:   Fri May 25 04:20:22 2007 +0000

    Commenting out the tests until find out who can test them in
    one of the problematic enviroments.

pynche code commented since 1998 and 2001:

commit ef30092207
Author: Barry Warsaw <barry@python.org>
Date:   Tue Dec 15 01:04:38 1998 +0000

    Added most of the mechanism to change the strips from color variations
    to color constants (i.e. red constant, green constant, blue
    constant).  But I haven't hooked this up yet because the UI gets more
    crowded and the arrows don't reflect the correct values.

    Added "Go to Black" and "Go to White" buttons.

commit 741eae0b31
Author: Barry Warsaw <barry@python.org>
Date:   Wed Apr 18 03:51:55 2001 +0000

    StripWidget.__init__(), update_yourself(): Removed some unused local
    variables reported by PyChecker.

    __togglegentype(): PyChecker accurately reported that the variable
    __gentypevar was unused -- actually this whole method is currently
    unused so comment it out.
2019-10-28 22:35:31 +01:00
Stein Karlsen aad2ee0156 bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768) 2019-10-14 13:36:29 +03:00
Ashwin Ramaswami ff2e182865 bpo-12707: deprecate info(), geturl(), getcode() methods in favor of headers, url, and status properties for HTTPResponse and addinfourl (GH-11447)
Co-Authored-By: epicfaace <aramaswamis@gmail.com>
2019-09-13 12:40:07 +01:00
Victor Stinner 7cb9204ee1
bpo-37421: urllib.request tests call urlcleanup() (GH-14529)
urllib.request tests now call urlcleanup() to remove temporary files
created by urlretrieve() tests and to clear the _opener global
variable set by urlopen() and functions calling indirectly urlopen().

regrtest now checks if urllib.request._url_tempfiles and
urllib.request._opener are changed by tests.
2019-07-02 14:50:19 +02:00
Victor Stinner eb976e47e2
bpo-36918: Fix "Exception ignored in" in test_urllib (GH-13996)
Mock the HTTPConnection.close() method in a few unit tests to avoid
logging "Exception ignored in: ..." messages.
2019-06-12 04:07:38 +02:00
Victor Stinner 0c2b6a3943
bpo-35907, CVE-2019-9948: urllib rejects local_file:// scheme (GH-13474)
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.

Co-Authored-By: SH <push0ebp@gmail.com>
2019-05-22 22:15:01 +02:00
Berker Peksag 2725cb01d7
bpo-36948: Fix test_urlopener_retrieve_file on Windows (GH-13476) 2019-05-22 02:00:35 +03:00
Xtreak c661b30f89 bpo-36948: Fix NameError in urllib.request.URLopener.retrieve (GH-13389) 2019-05-19 16:40:05 +03:00
Gregory P. Smith b7378d7728
bpo-30458: Use InvalidURL instead of ValueError. (GH-13044)
Use http.client.InvalidURL instead of ValueError as the new error case's exception.
2019-05-01 16:39:21 -04:00
Xtreak 2fc936ed24 bpo-30458: Disable https related urllib tests on a build without ssl (GH-13032)
These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures.


https://bugs.python.org/issue30458
2019-05-01 04:59:48 -07:00
Gregory P. Smith c4e671eec2
bpo-30458: Disallow control chars in http URLs. (GH-12755)
Disallow control chars in http URLs in urllib.urlopen.  This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected.
2019-04-30 19:12:21 -07:00
Stéphane Wirtel a40681dd5d bpo-36019: Use pythontest.net instead of example.com in network tests (GH-11941) 2019-02-22 14:45:36 +01:00
Senthil Kumaran efbd4ea65d Minor spell fix and formatting fixes in urllib tests. (#959) 2017-04-01 23:47:35 -07:00
Ratnadeep Debnath 21024f0662 bpo-16285: Update urllib quoting to RFC 3986 (#173)
* bpo-16285: Update urllib quoting to RFC 3986

urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.

Patch by Christian Theune and Ratnadeep Debnath.
2017-02-25 19:00:28 +10:00
Xiang Zhang c44d58a77a Issue #29142: Merge 3.5. 2017-01-09 11:50:02 +08:00
Xiang Zhang 959ff7f1c6 Issue #29142: Fix suffixes in no_proxy handling in urllib.
In urllib.request, suffixes in no_proxy environment variable with
leading dots could match related hostnames again (e.g. .b.c matches a.b.c).
Patch by Milan Oberkirch.
2017-01-09 11:47:55 +08:00
Christian Heimes d04863771b Issue #28022: Deprecate ssl-related arguments in favor of SSLContext.
The deprecation include manual creation of SSLSocket and certfile/keyfile
(or similar) in ftplib, httplib, imaplib, smtplib, poplib and urllib.

ssl.wrap_socket() is not marked as deprecated yet.
2016-09-10 23:23:33 +02:00
Martin Panter 0be894b2f6 Issue #27895: Spelling fixes (Contributed by Ville Skyttä). 2016-09-07 12:03:06 +00:00
R David Murray 44b548dda8 #27364: fix "incorrect" uses of escape character in the stdlib.
And most of the tools.

Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
2016-09-08 13:59:53 -04:00
Raymond Hettinger 15f44ab043 Issue #27895: Spelling fixes (Contributed by Ville Skyttä). 2016-08-30 10:47:49 -07:00
Senthil Kumaran 17742f2d45 [merge from 3.4] - Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:39:06 -07:00