the locale encoding. If the LANG (and LC_ALL and LC_CTYPE) environment variable
is not set, the locale encoding is ISO-8859-1, whereas most programs (including
Python) expect UTF-8. Python already uses UTF-8 for the filesystem encoding and
to encode command line arguments on this OS.
_Py_char2wchar() callers usually need the result size in characters. Since it's
trivial to compute it in _Py_char2wchar() (O(1) whereas wcslen() is O(n)), add
an option to get it.
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
Py_FileSystemDefaultEncoding is NULL
* redecode_filenames() functions and _Py_code_object_list (issue #9630)
are no more needed: remove them
makeunicodedata.py: download all data files from unicode.org,
switch to extracting Unihan data from zip file.
Read linebreakprops and derivednormalizationprops even for
old versions, even though they are not used in delta records.
test:unicode.py: U+11000 is now assigned, use U+14000 instead.
Redecode the filenames of:
- all modules: __file__ and __path__ attributes
- all code objects: co_filename attribute
- sys.path
- sys.meta_path
- sys.executable
- sys.path_importer_cache (keys)
Keep weak references to all code objects until initfsencoding() is called, to
be able to redecode co_filename attribute of all code objects.
if the format string is not empty. Manually merge r79596 and r84772
from 2.x.
Also, apparently test_format() from test_builtin never made it into
3.x. I've added it as well. It tests the basic format()
infrastructure.
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).
The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
now return the correct value for large code points
- repr() may consider more characters as printable.
... to get the filename as a unicode object, instead of a byte string. Function
needed to support unencodable filenames. Deprecate PyModule_GetFilename() in
favor on the new function.
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.
* Don't specify surrogateescape error handler in the comments nor the
documentation, but PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_EncodeFSDefault() because these functions use strict error handler
for the mbcs encoding (on Windows).
* Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
inconsistency with unicodeobject.h.
output with no type specifier failed to match the str output:
- format(complex(-0.0, 2.0), '-') omitted the real part from the output,
- format(complex(0.0, 2.0), '-') included a sign and parentheses.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81465 | georg.brandl | 2010-05-22 06:29:19 -0500 (Sat, 22 May 2010) | 2 lines
Issue #3924: Ignore cookies with invalid "version" field in cookielib.
........
r81466 | georg.brandl | 2010-05-22 06:31:16 -0500 (Sat, 22 May 2010) | 1 line
Underscore the name of an internal utility function.
........
r81468 | georg.brandl | 2010-05-22 06:43:25 -0500 (Sat, 22 May 2010) | 1 line
#8635: document enumerate() start parameter in docstring.
........
r81679 | benjamin.peterson | 2010-06-03 16:21:03 -0500 (Thu, 03 Jun 2010) | 1 line
use a set for membership testing
........
r81735 | michael.foord | 2010-06-05 06:46:59 -0500 (Sat, 05 Jun 2010) | 1 line
Extract error message truncating into a method (unittest.TestCase._truncateMessage).
........
r81760 | michael.foord | 2010-06-05 14:38:42 -0500 (Sat, 05 Jun 2010) | 1 line
Issue 8302. SkipTest exception is setUpClass or setUpModule is now reported as a skip rather than an error.
........
r81868 | benjamin.peterson | 2010-06-09 14:45:04 -0500 (Wed, 09 Jun 2010) | 1 line
fix code formatting
........
r82183 | benjamin.peterson | 2010-06-23 15:29:26 -0500 (Wed, 23 Jun 2010) | 1 line
cpython only gc tests
........