cpython

Commit Graph

Author	SHA1	Message	Date
Petr Viktorin	0976339818	gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) ## Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. ## Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2024-07-31 00:19:48 +02:00
Matthieu Caneill	cecaceea31	gh-120930: Remove extra blank occuring in wrapped encoded words in email headers (GH-121747)	2024-07-18 14:48:05 +02:00
Toshio Kuratomi	a6fdb31b67	gh-92081: Fix for email.generator.Generator with whitespace between encoded words. (#92281 ) * Fix for email.generator.Generator with whitespace between encoded words. email.generator.Generator currently does not handle whitespace between encoded words correctly when the encoded words span multiple lines. The current generator will create an encoded word for each line. If the end of the line happens to correspond with the end real word in the plaintext, the generator will place an unencoded space at the start of the subsequent lines to represent the whitespace between the plaintext words. A compliant decoder will strip all the whitespace from between two encoded words which leads to missing spaces in the round-tripped output. The fix for this is to make sure that whitespace between two encoded words ends up inside of one or the other of the encoded words. This fix places the space inside of the second encoded word. A second problem happens with continuation lines. A continuation line that starts with whitespace and is followed by a non-encoded word is fine because the newline between such continuation lines is defined as condensing to a single space character. When the continuation line starts with whitespace followed by an encoded word, however, the RFCs specify that the word is run together with the encoded word on the previous line. This is because normal words are filded on syntactic breaks by encoded words are not. The solution to this is to add the whitespace to the start of the encoded word on the continuation line. Test cases are from #92081 * Rename a variable so it's not confused with the final variable.	2024-05-20 19:10:47 +00:00
Serhiy Storchaka	aec1dac4ef	gh-117313: Fix re-folding email messages containing non-standard line separators (GH-117369) Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email messages. Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e' and Unicode line separators '\x85', '\u2028' and '\u2029' as is.	2024-04-17 13:00:25 +03:00
Jens Troeger	45b2f8893c	bpo-34424: Handle different policy.linesep lengths correctly. (#8803 )	2019-05-13 21:07:39 -04:00
R. David Murray	85d5c18c9d	bpo-27240 Rewrite the email header folding algorithm. (#3488 ) The original algorithm tried to delegate the folding to the tokens so that those tokens whose folding rules differed could specify the differences. However, this resulted in a lot of duplicated code because most of the rules were the same. The new algorithm moves all folding logic into a set of functions external to the token classes, but puts the information about which tokens can be folded in which ways on the tokens...with the exception of mime-parameters, which are a special case (which was not even implemented in the old folder). This algorithm can still probably be improved and hopefully simplified somewhat. Note that some of the test expectations are changed. I believe the changes are toward more desirable and consistent behavior: in general when (re) folding a line the canonical version of the tokens is generated, rather than preserving errors or extra whitespace.	2017-12-03 18:51:41 -05:00
mircea-cosbuc	b459f74826	[email] bpo-29478: Fix passing max_line_length=None from Compat32 policy (GH-595) If max_line_length=None is specified while using the Compat32 policy, it is no longer ignored.	2017-06-11 23:43:41 -07:00
Raymond Hettinger	15f44ab043	Issue #27895 : Spelling fixes (Contributed by Ville Skyttä).	2016-08-30 10:47:49 -07:00
R David Murray	fdb23c2fe5	#20098 : add mangle_from_ policy option. This defaults to True in the compat32 policy for backward compatibility, but to False for all new policies. Patch by Milan Oberkirch, with a few tweaks.	2015-05-17 14:24:33 -04:00
R David Murray	224ef3ec3b	#24211 : Add RFC6532 support to the email library. This could use more edge case tests, but the basic functionality is tested. (Note that this changeset does not add tailored support for the RFC 6532 message/global MIME type, but the email package generic facilities will handle it.) Reviewed by Maciej Szulik.	2015-05-17 11:29:21 -04:00
R David Murray	1be413e366	Don't use metaclasses when class decorators can do the job. Thanks to Nick Coghlan for pointing out that I'd forgotten about class decorators.	2012-05-31 18:00:45 -04:00
R David Murray	56517e5cb9	Make parameterized tests in email less hackish. Or perhaps more hackish, depending on your perspective. But at least this way it is now possible to run the individual tests using the unittest CLI.	2012-05-30 21:53:40 -04:00
R David Murray	0b6f6c82b5	#12586 : add provisional email policy with new header parsing and folding. When the new policies are used (and only when the new policies are explicitly used) headers turn into objects that have attributes based on their parsed values, and can be set using objects that encapsulate the values, as well as set directly from unicode strings. The folding algorithm then takes care of encoding unicode where needed, and folding according to the highest level syntactic objects. With this patch only date and time headers are parsed as anything other than unstructured, but that is all the helper methods in the existing API handle. I do plan to add more parsers, and complete the set specified in the RFC before the package becomes stable.	2012-05-25 18:42:14 -04:00
R David Murray	c27e52265b	#14731 : refactor email policy framework. This patch primarily does two things: (1) it adds some internal-interface methods to Policy that allow for Policy to control the parsing and folding of headers in such a way that we can construct a backward compatibility policy that is 100% compatible with the 3.2 API, while allowing a new policy to implement the email6 API. (2) it adds that backward compatibility policy and refactors the test suite so that the only differences between the 3.2 test_email.py file and the 3.3 test_email.py file is some small changes in test framework and the addition of tests for bugs fixed that apply to the 3.2 API. There are some additional teaks, such as moving just the code needed for the compatibility policy into _policybase, so that the library code can import only _policybase. That way the new code that will be added for email6 will only get imported when a non-compatibility policy is imported.	2012-05-25 15:01:48 -04:00
R David Murray	3edd22ac95	#11731 : simplify/enhance parser/generator API by introducing policy objects. This new interface will also allow for future planned enhancements in control over the parser/generator without requiring any additional complexity in the parser/generator API. Patch reviewed by Éric Araujo and Barry Warsaw.	2011-04-18 13:59:37 -04:00

15 Commits