#16127: merge with 3.3.

2012-10-05 03:34:02 +03:00 · 2012-10-05 03:34:02 +03:00 · 080a2c087e
parent b176203dda e7f90375b1
commit 080a2c087e
4 changed files with 6 additions and 17 deletions
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@ -1083,8 +1083,6 @@ These are the UTF-32 codec APIs:
   After completion, *\*byteorder* is set to the current byte order at the end
   of input data.
   In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
   If *byteorder* is *NULL*, the codec starts in native order mode.
   Return *NULL* if an exception was raised by the codec.
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@ -538,9 +538,7 @@ Notes:
   this escape sequence.  Exactly four hex digits are required.
 (6)
-   Any Unicode character can be encoded this way, but characters outside the Basic
+   Any Unicode character can be encoded this way.  Exactly eight hex digits
   Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
   compiled to use 16-bit code units (the default).  Exactly eight hex digits
   are required.
--- a/Include/unicodeobject.h
+++ b/Include/unicodeobject.h
@ -1022,8 +1022,7 @@ PyAPI_FUNC(void*) _PyUnicode_AsKind(PyObject *s, unsigned int kind);
 /* Create a Unicode Object from the given Unicode code point ordinal.
-   The ordinal must be in range(0x10000) on narrow Python builds
+   The ordinal must be in range(0x110000). A ValueError is
   (UCS2), and range(0x110000) on wide builds (UCS4). A ValueError is
   raised in case it is not.
 */
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@ -5800,18 +5800,12 @@ PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
    void *data;
    Py_ssize_t expandsize = 0;
-    /* Initial allocation is based on the longest-possible unichr
+    /* Initial allocation is based on the longest-possible character
       escape.
-       In wide (UTF-32) builds '\U00xxxxxx' is 10 chars per source
+       For UCS1 strings it's '\xxx', 4 bytes per source character.
-       unichr, so in this case it's the longest unichr escape. In
+       For UCS2 strings it's '\uxxxx', 6 bytes per source character.
-       narrow (UTF-16) builds this is five chars per source unichr
+       For UCS4 strings it's '\U00xxxxxx', 10 bytes per source character.
       since there are two unichrs in the surrogate pair, so in narrow
       (UTF-16) builds it's not the longest unichr escape.
       In wide or narrow builds '\uxxxx' is 6 chars per source unichr,
       so in the narrow (UTF-16) build case it's the longest unichr
       escape.
    */
    if (!PyUnicode_Check(unicode)) {