Commit Graph

15 Commits

Author SHA1 Message Date
Petr Viktorin 0976339818
gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)
## Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.


## Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.


Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-07-31 00:19:48 +02:00
Matthieu Caneill cecaceea31
gh-120930: Remove extra blank occuring in wrapped encoded words in email headers (GH-121747) 2024-07-18 14:48:05 +02:00
Toshio Kuratomi a6fdb31b67
gh-92081: Fix for email.generator.Generator with whitespace between encoded words. (#92281)
* Fix for email.generator.Generator with whitespace between encoded words.

email.generator.Generator currently does not handle whitespace between
encoded words correctly when the encoded words span multiple lines.  The
current generator will create an encoded word for each line.  If the end
of the line happens to correspond with the end real word in the
plaintext, the generator will place an unencoded space at the start of
the subsequent lines to represent the whitespace between the plaintext
words.

A compliant decoder will strip all the whitespace from between two
encoded words which leads to missing spaces in the round-tripped
output.

The fix for this is to make sure that whitespace between two encoded
words ends up inside of one or the other of the encoded words.  This
fix places the space inside of the second encoded word.

A second problem happens with continuation lines.  A continuation line that
starts with whitespace and is followed by a non-encoded word is fine because
the newline between such continuation lines is defined as condensing to
a single space character.  When the continuation line starts with whitespace
followed by an encoded word, however, the RFCs specify that the word is run
together with the encoded word on the previous line.  This is because normal
words are filded on syntactic breaks by encoded words are not.

The solution to this is to add the whitespace to the start of the encoded word
on the continuation line.

Test cases are from #92081

* Rename a variable so it's not confused with the final variable.
2024-05-20 19:10:47 +00:00
Serhiy Storchaka aec1dac4ef
gh-117313: Fix re-folding email messages containing non-standard line separators (GH-117369)
Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email
messages.  Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e'
and Unicode line separators '\x85', '\u2028' and '\u2029' as is.
2024-04-17 13:00:25 +03:00
Jens Troeger 45b2f8893c bpo-34424: Handle different policy.linesep lengths correctly. (#8803) 2019-05-13 21:07:39 -04:00
R. David Murray 85d5c18c9d
bpo-27240 Rewrite the email header folding algorithm. (#3488)
The original algorithm tried to delegate the folding to the tokens so
that those tokens whose folding rules differed could specify the
differences.  However, this resulted in a lot of duplicated code because
most of the rules were the same.

The new algorithm moves all folding logic into a set of functions
external to the token classes, but puts the information about which
tokens can be folded in which ways on the tokens...with the exception of
mime-parameters, which are a special case (which was not even
implemented in the old folder).

This algorithm can still probably be improved and hopefully simplified
somewhat.

Note that some of the test expectations are changed.  I believe the
changes are toward more desirable and consistent behavior: in general
when (re) folding a line the canonical version of the tokens is
generated, rather than preserving errors or extra whitespace.
2017-12-03 18:51:41 -05:00
mircea-cosbuc b459f74826 [email] bpo-29478: Fix passing max_line_length=None from Compat32 policy (GH-595)
If max_line_length=None is specified while using the Compat32 policy,
it is no longer ignored.
2017-06-11 23:43:41 -07:00
Raymond Hettinger 15f44ab043 Issue #27895: Spelling fixes (Contributed by Ville Skyttä). 2016-08-30 10:47:49 -07:00
R David Murray fdb23c2fe5 #20098: add mangle_from_ policy option.
This defaults to True in the compat32 policy for backward compatibility,
but to False for all new policies.

Patch by Milan Oberkirch, with a few tweaks.
2015-05-17 14:24:33 -04:00
R David Murray 224ef3ec3b #24211: Add RFC6532 support to the email library.
This could use more edge case tests, but the basic functionality is tested.
(Note that this changeset does not add tailored support for the RFC 6532
message/global MIME type, but the email package generic facilities will handle
it.)

Reviewed by Maciej Szulik.
2015-05-17 11:29:21 -04:00
R David Murray 1be413e366 Don't use metaclasses when class decorators can do the job.
Thanks to Nick Coghlan for pointing out that I'd forgotten about class
decorators.
2012-05-31 18:00:45 -04:00
R David Murray 56517e5cb9 Make parameterized tests in email less hackish.
Or perhaps more hackish, depending on your perspective.  But at least this
way it is now possible to run the individual tests using the unittest CLI.
2012-05-30 21:53:40 -04:00
R David Murray 0b6f6c82b5 #12586: add provisional email policy with new header parsing and folding.
When the new policies are used (and only when the new policies are explicitly
used) headers turn into objects that have attributes based on their parsed
values, and can be set using objects that encapsulate the values, as well as
set directly from unicode strings.  The folding algorithm then takes care of
encoding unicode where needed, and folding according to the highest level
syntactic objects.

With this patch only date and time headers are parsed as anything other than
unstructured, but that is all the helper methods in the existing API handle.
I do plan to add more parsers, and complete the set specified in the RFC
before the package becomes stable.
2012-05-25 18:42:14 -04:00
R David Murray c27e52265b #14731: refactor email policy framework.
This patch primarily does two things: (1) it adds some internal-interface
methods to Policy that allow for Policy to control the parsing and folding of
headers in such a way that we can construct a backward compatibility policy
that is 100% compatible with the 3.2 API, while allowing a new policy to
implement the email6 API.  (2) it adds that backward compatibility policy and
refactors the test suite so that the only differences between the 3.2
test_email.py file and the 3.3 test_email.py file is some small changes in
test framework and the addition of tests for bugs fixed that apply to the 3.2
API.

There are some additional teaks, such as moving just the code needed for the
compatibility policy into _policybase, so that the library code can import
only _policybase.  That way the new code that will be added for email6
will only get imported when a non-compatibility policy is imported.
2012-05-25 15:01:48 -04:00
R David Murray 3edd22ac95 #11731: simplify/enhance parser/generator API by introducing policy objects.
This new interface will also allow for future planned enhancements
in control over the parser/generator without requiring any additional
complexity in the parser/generator API.

Patch reviewed by Éric Araujo and Barry Warsaw.
2011-04-18 13:59:37 -04:00