gh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (GH-103849)
* Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format
---------
(cherry picked from commit 29f348e232)
Co-authored-by: JohnJamesUtley <81572567+JohnJamesUtley@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character.
RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )`
RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A`
The WHATWG URL spec defines a scheme like this:
`"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
(cherry picked from commit 439b9cfaf4)
Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
Switch to lru_cache in urllib.parse.
urllib.parse now uses functool.lru_cache for its internal URL splitting and
quoting caches instead of rolling its own like its the 90s.
The undocumented internal Quoted class API is now deprecated
as it had no reason to be public and no existing OSS users were found.
The clear_cache() API remains undocumented but gets an explicit test as it
is used in a few projects' (twisted, gevent) tests as well as our own regrtest.
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().
urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.
Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Éric Araujo <merwok@netwok.org>
* bpo-27657: Fix urlparse() with numeric paths
Revert parsing decision from bpo-754016 in favor of the documented
consensus in bpo-16932 of how to treat strings without a // to
designate the netloc.
* bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
The current regex based splitting produces a wrong result. For example::
http://abc#@def
Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
* bpo-16285: Update urllib quoting to RFC 3986
urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.
Patch by Christian Theune and Ratnadeep Debnath.
base32, ascii85 and base85 codecs in the base64 module, and delay the
initialization of the unquote_to_bytes() table of the urllib.parse module, to
not waste memory if these modules are not used.