subinterpreter to fix a bootstrap issue with codecs implemented in Python, as
the ISO-8859-15 codec.
Add fscodec_initialized attribute to the PyInterpreterState structure.
subinterpreter to fix a bootstrap issue with codecs implemented in Python, as
the ISO-8859-15 codec.
Add fscodec_initialized attribute to the PyInterpreterState structure.
When a string is encoded to UTF-8 in strict mode, the result is cached into the
object. Examples: str.encode(), str.encode('utf-8'), PyUnicode_AsUTF8String()
and PyUnicode_AsEncodedString(unicode, "utf-8", NULL).
Issue #11187: Remove bootstrap code (use ASCII) of
PyUnicode_AsEncodedString(), it was replaced by a better fallback (use
the locale encoding) in PyUnicode_EncodeFSDefault().
Prepare also empty sections in NEWS.
types. Added a new API function, PyUnicode_TransformDecimalToASCII(),
which transforms non-ASCII decimal digits in a Unicode string to their
ASCII equivalents.
* Specify the default encoding: write 'utf-8' instead of
sys.getdefaultencoding(), because the default encoding is now constant
* Specify the default errors value
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r86277 | eric.smith | 2010-11-06 15:27:37 -0400 (Sat, 06 Nov 2010) | 1 line
Added more to docstrings for str.format, format_map, and __format__.
........
the locale encoding. If the LANG (and LC_ALL and LC_CTYPE) environment variable
is not set, the locale encoding is ISO-8859-1, whereas most programs (including
Python) expect UTF-8. Python already uses UTF-8 for the filesystem encoding and
to encode command line arguments on this OS.
_Py_char2wchar() callers usually need the result size in characters. Since it's
trivial to compute it in _Py_char2wchar() (O(1) whereas wcslen() is O(n)), add
an option to get it.
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
Py_FileSystemDefaultEncoding is NULL
* redecode_filenames() functions and _Py_code_object_list (issue #9630)
are no more needed: remove them
Redecode the filenames of:
- all modules: __file__ and __path__ attributes
- all code objects: co_filename attribute
- sys.path
- sys.meta_path
- sys.executable
- sys.path_importer_cache (keys)
Keep weak references to all code objects until initfsencoding() is called, to
be able to redecode co_filename attribute of all code objects.
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.
* Don't specify surrogateescape error handler in the comments nor the
documentation, but PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_EncodeFSDefault() because these functions use strict error handler
for the mbcs encoding (on Windows).
* Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
inconsistency with unicodeobject.h.
svn+ssh://svn.python.org/python/branches/py3k
........
r83444 | georg.brandl | 2010-08-01 22:51:02 +0200 (So, 01 Aug 2010) | 1 line
Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway.
........
svn+ssh://svn.python.org/python/branches/py3k
........
r83395 | georg.brandl | 2010-08-01 10:49:18 +0200 (So, 01 Aug 2010) | 1 line
#8821: do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character.
........
r83417 | georg.brandl | 2010-08-01 20:38:26 +0200 (So, 01 Aug 2010) | 1 line
#5776: fix mistakes in python specfile. (Nobody probably uses it anyway.)
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r82413 | ezio.melotti | 2010-07-01 10:32:02 +0300 (Thu, 01 Jul 2010) | 13 lines
Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
........
r82468 | ezio.melotti | 2010-07-03 07:52:19 +0300 (Sat, 03 Jul 2010) | 1 line
Update comment about surrogates.
........
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
mode raises unicode errors. The encoder only supports "strict" and "replace"
error handlers, the decoder only supports "strict" and "ignore" error handlers.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines
Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
the interpreter with characters outside the Basic Multilingual Plane
(higher than 0x10000).
........
object to Py_FileSystemDefaultEncoding with the "surrogateescape" error
handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set,
fall back to UTF-8.
This function is only used to decode Python module filenames, but Python
doesn't support surrogates in modules filenames yet. So nobody noticed this
minor bug.