Commit Graph

118 Commits

Author SHA1 Message Date
Miss Islington (bot) ff4e5c2566
[3.9] gh-105704: Disallow square brackets (`[` and `]`) in domain names for parsed URLs (GH-129418) (#129530)
(cherry picked from commit d89a5f6a6e)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2025-02-19 14:36:40 +01:00
Victor Stinner ddca295319
[3.9] gh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (#103849) (#126976)
Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit 29f348e232)

Co-authored-by: JohnJamesUtley <81572567+JohnJamesUtley@users.noreply.github.com>
2024-12-02 13:36:46 +01:00
Serhiy Storchaka a5798d0cc7
[3.9] gh-67693: Fix urlunparse() and urlunsplit() for URIs with path starting with multiple slashes and no authority (GH-113563) (#119027)
(cherry picked from commit e237b25a4f)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2024-09-05 14:05:43 +02:00
Miss Islington (bot) d7f8a5fe07
[3.9] gh-102153: Start stripping C0 control and space chars in `urlsplit` (GH-102508) (GH-104575) (GH-104592) (#104593)
gh-102153: Start stripping C0 control and space chars in `urlsplit` (GH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit GH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

I simplified the docs by eliding the state of the world explanatory
paragraph in this security release only backport.  (people will see
that in the mainline /3/ docs)

(cherry picked from commit 2f630e1ce1)
(cherry picked from commit 610cc0ab1b)
(cherry picked from commit f48a96a280)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
2023-05-22 12:42:37 +02:00
Senthil Kumaran 8a595744e6
[3.9] bpo-43882 Remove the newline, and tab early. From query and fragments. (#25853)
* Remove the newline, and tab early. From query and fragments.
2021-05-03 12:08:59 -07:00
Miss Islington (bot) 491fde0161
[3.9] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) (GH-25725)
* bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 76cd81d603)
Co-authored-by: Senthil Kumaran <skumaran@gatech.edu>
2021-04-29 10:57:31 -07:00
Miss Islington (bot) 6ec2fb42f9
bpo-42967: coerce bytes separator to string in urllib.parse_qs(l) (GH-24818)
* coerce bytes separator to string

* Add news

* Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst
(cherry picked from commit b38601d496)

Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
2021-04-11 06:49:35 -07:00
Senthil Kumaran c9f07813ab
[3.9] bpo-42967: only use '&' as a query string separator (GH-24297) (#24528)
(cherry picked from commit fcbe0cb04d)

* [3.9] bpo-42967: only use '&' as a query string separator (GH-24297)

bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.

Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com>
2021-02-15 10:03:31 -08:00
Batuhan Taşkaya 0361556537
bpo-39481: PEP 585 for a variety of modules (GH-19423)
- concurrent.futures
- ctypes
- http.cookies
- multiprocessing
- queue
- tempfile
- unittest.case
- urllib.parse
2020-04-10 07:46:36 -07:00
idomic c33bdbb20c
bpo-37970: update and improve urlparse and urlsplit doc-strings (GH-16458) 2020-02-16 21:17:58 +02:00
Serhiy Storchaka 6a265f0d0c
bpo-39057: Fix urllib.request.proxy_bypass_environment(). (GH-17619)
Ignore leading dots and no longer ignore a trailing newline.
2020-01-05 14:14:31 +02:00
Tim Graham 5a88d50ff0 bpo-27657: Fix urlparse() with numeric paths (#661)
* bpo-27657: Fix urlparse() with numeric paths

Revert parsing decision from bpo-754016 in favor of the documented
consensus in bpo-16932 of how to treat strings without a // to
designate the netloc.

* bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
2019-10-18 06:07:20 -07:00
Stein Karlsen aad2ee0156 bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768) 2019-10-14 13:36:29 +03:00
Steve Dower 8d0ef0b5ed bpo-36742: Corrects fix to handle decomposition in usernames (#13812) 2019-06-04 17:55:29 +02:00
Rémi Lapeyre 674ee12600 bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481) 2019-05-27 09:43:45 -04:00
Steve Dower d537ab0ff9
bpo-36742: Fixes handling of pre-normalization characters in urlsplit() (GH-13017) 2019-04-30 12:03:02 +00:00
Jörn Hees 750d74fac5 bpo-12910: update and correct quote docstring (#2568)
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
2019-04-09 17:31:18 -07:00
Steve Dower 16e6f7dee7
bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) 2019-03-07 08:02:26 -08:00
matthewbelisle-wf 209144831b bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660)
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
2018-10-19 03:52:59 -07:00
Cheryl Sabella 0250de4819 bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205) 2018-04-25 16:51:54 -07:00
Matt Eaton 2cb4661707 bpo-33034: Improve exception message when cast fails for {Parse,Split}Result.port (GH-6078) 2018-03-20 09:41:37 +03:00
Коренберг Марк fbd605151f bpo-32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value (#4867) 2017-12-21 14:16:17 +02:00
Oren Milman 8df44ee8e0 remove a redundant lower in urllib.parse.urlsplit (#3008) 2017-09-02 21:51:39 -07:00
postmasters 90e01e50ef urllib: Simplify splithost by calling into urlparse. (#1849)
The current regex based splitting produces a wrong result. For example::

  http://abc#@def

Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
2017-06-20 15:02:44 +02:00
Senthil Kumaran 906f5330b9 bpo-29976: urllib.parse clarify '' in scheme values. (GH-984) 2017-05-17 21:48:59 -07:00
Senthil Kumaran 257b980b31 correct parse_qs and parse_qsl test case descriptions. (#968)
* correct parse_qs and parse_qsl test case descriptions.
2017-04-04 21:19:43 -07:00
Ratnadeep Debnath 21024f0662 bpo-16285: Update urllib quoting to RFC 3986 (#173)
* bpo-16285: Update urllib quoting to RFC 3986

urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.

Patch by Christian Theune and Ratnadeep Debnath.
2017-02-25 19:00:28 +10:00
Serhiy Storchaka 8cbd3df3ce Issue #28992: Use bytes.fromhex(). 2016-12-21 12:59:28 +02:00
Berker Peksag f8479eeb34 Issue #25895: Merge from 3.5 2016-09-16 14:45:15 +03:00
Berker Peksag f676748a05 Issue #25895: Enable WebSocket URL schemes in urllib.parse.urljoin
Patch by Gergely Imreh and Markus Holtermann.
2016-09-16 14:43:58 +03:00
Senthil Kumaran 0b57f0adde merge from 3.5
Remove unnecessary test case comment in urllib.parse.py. These are asserted as test cases.
2016-01-25 18:54:37 -08:00
Senthil Kumaran d4e51f45a9 Remove unnecessary test case comment in urllib.parse.py. These are asserted as test cases. 2016-01-25 18:53:34 -08:00
Senthil Kumaran 86f7109dad Issue #25822: Add docstrings to the fields of urllib.parse results.
Patch contributed by Swati Jaiswal.
2016-01-14 00:11:39 -08:00
Robert Collins dfa95c9a8f Issue #20059: urllib.parse raises ValueError on all invalid ports.
Patch by Martin Panter.
2015-08-10 09:53:30 +12:00
R David Murray c17686f071 Issue #13866: add *quote_via* argument to urlencode.
Patch by samwyse, completed by Arnon Yaari, and reviewed by
Martin Panter.
2015-05-17 20:44:50 -04:00
Berker Peksag 20416f7994 Issue #23703: Fix a regression in urljoin() introduced in 901e4e52b20a.
Patch by Demian Brecht.
2015-04-16 02:31:14 +03:00
Serhiy Storchaka 1515450440 Issue #23411: Added DefragResult, ParseResult, SplitResult, DefragResultBytes,
ParseResultBytes, and SplitResultBytes to urllib.parse.__all__.
Patch by Martin Panter.
2015-04-07 19:09:01 +03:00
Serhiy Storchaka 44eceb6e2a Issue #23563: Optimized utility functions in urllib.parse. 2015-03-03 20:21:35 +02:00
R David Murray 3ab6ba4744 Merge: #23040: Clarify treatment of encoding and errors when component is bytes. 2014-12-24 21:24:07 -05:00
R David Murray 8c4e112afc #23040: Clarify treatment of encoding and errors when component is bytes.
Patch by Wojtek Ruszczewski.
2014-12-24 21:23:18 -05:00
Senthil Kumaran a66e3885fb Issue #22278: Fix urljoin problem with relative urls, a regression observed
after changes to issue22118 were submitted.

Patch contributed by Demian Brecht and reviewed by Antoine Pitrou.
2014-09-22 15:49:16 +08:00
Antoine Pitrou 55ac5b3f7b Issue #22118: Switch urllib.parse to use RFC 3986 semantics for the resolution of relative URLs, rather than RFCs 1808 and 2396.
Patch by Demian Brecht.
2014-08-21 19:16:17 -04:00
Serhiy Storchaka 465e60e654 Issue #22033: Reprs of most Python implemened classes now contain actual
class name instead of hardcoded one.
2014-07-25 23:36:00 +03:00
Victor Stinner d6a91a7ab6 Issue #20879: Delay the initialization of encoding and decoding tables for
base32, ascii85 and base85 codecs in the base64 module, and delay the
initialization of the unquote_to_bytes() table of the urllib.parse module, to
not waste memory if these modules are not used.
2014-03-17 22:38:41 +01:00
Serhiy Storchaka 5d83d1a814 Issue #20270: urllib.urlparse now supports empty ports. 2014-01-18 18:31:41 +02:00
Serhiy Storchaka ff97b08d00 Issue #20270: urllib.urlparse now supports empty ports. 2014-01-18 18:30:33 +02:00
Senthil Kumaran d80f7be580 merge from 3.3
Improve urlencode docstring. Patch by Brian Brazil.
Closes issue #15350
2013-09-05 21:43:53 -07:00
Senthil Kumaran 324ae385fe Improve urlencode docstring. Patch by Brian Brazil. 2013-09-05 21:42:38 -07:00
Raymond Hettinger 56b0a3d89a Remove redundant imports 2013-04-06 20:53:12 -07:00
Serhiy Storchaka 8ea4616f16 Issue #1285086: Get rid of the refcounting hack and speed up
urllib.parse.unquote() and urllib.parse.unquote_to_bytes().
2013-03-14 21:31:37 +02:00