Victor Stinner
1c8d0c76a1
Fix resize_inplace(): update shared utf8 pointer
2011-10-03 12:11:00 +02:00
Victor Stinner
ca4f7a4298
Disable unicode_resize() optimization on Windows (16-bit wchar_t)
2011-10-03 04:18:04 +02:00
Victor Stinner
126c559d05
_PyUnicode_Ready() for 16-bit wchar_t
2011-10-03 04:17:10 +02:00
Victor Stinner
2fd82278cb
Fix compilation error on Windows
...
Fix also a compiler warning.
2011-10-03 04:06:05 +02:00
Victor Stinner
a3be613a56
Use PyUnicode_WCHAR_KIND to check if a string is a wstr string
...
Simplify the test in wstr pointer in unicode_sizeof().
2011-10-03 02:16:37 +02:00
Victor Stinner
910337b42e
Add _PyUnicode_CheckConsistency() macro to help debugging
...
* Document Unicode string states
* Use _PyUnicode_CheckConsistency() to ensure that objects are always
consistent.
2011-10-03 03:20:16 +02:00
Victor Stinner
4fae54cb0e
In release mode, PyUnicode_InternInPlace() does nothing if the input is NULL or
...
not a unicode, instead of failing with a fatal error.
Use assertions in debug mode (provide better error messages).
2011-10-03 02:01:52 +02:00
Victor Stinner
23e5668214
PyUnicode_Append() now works in-place when it's possible
2011-10-03 03:54:37 +02:00
Victor Stinner
fe226c0d37
Rewrite PyUnicode_Resize()
...
* Rename _PyUnicode_Resize() to unicode_resize()
* unicode_resize() creates a copy if the string cannot be resized instead
of failing
* Optimize resize_copy() for wstr strings
* Disable temporary resize_inplace()
2011-10-03 03:52:20 +02:00
Victor Stinner
829c0adca9
Add _PyUnicode_HAS_UTF8_MEMORY() macro
2011-10-03 01:08:02 +02:00
Victor Stinner
fe0c155c4f
Write _PyUnicode_Dump() to help debugging
2011-10-03 02:59:31 +02:00
Victor Stinner
f42dc448e0
PyUnicode_CopyCharacters() fails when copying latin1 into ascii
2011-10-02 23:33:16 +02:00
Victor Stinner
c53be96c54
unicode_convert_wchar_to_ucs4() cannot fail
2011-10-02 21:33:54 +02:00
Victor Stinner
c3c7415639
Add _PyUnicode_DATA_ANY(op) private macro
2011-10-02 20:39:55 +02:00
Victor Stinner
a464fc141d
unicode_empty and unicode_latin1 are PyObject* objects, not PyUnicodeObject*
2011-10-02 20:39:30 +02:00
Victor Stinner
267aa24365
PyUnicode_FindChar() raises a IndexError on invalid index
2011-10-02 01:08:37 +02:00
Victor Stinner
bc603d12b7
Optimize _PyUnicode_AsKind() for UCS1->UCS4 and UCS2->UCS4
...
* Ensure that the input string is ready
* Raise a ValueError instead of of a fatal error
2011-10-02 01:00:40 +02:00
Victor Stinner
5a706cf8c0
Fix usage of PyUnicode_READY() in PyUnicode_GetLength()
2011-10-02 00:36:53 +02:00
Victor Stinner
cd9950fd09
PyUnicode_WriteChar() raises IndexError on invalid index
...
PyUnicode_WriteChar() raises also a ValueError if the string has more than 1
reference.
2011-10-02 00:34:53 +02:00
Victor Stinner
2fe5ced752
PyUnicode_ReadChar() raises a IndexError if the index in invalid
...
unicode_getitem() reuses PyUnicode_ReadChar()
2011-10-02 00:25:40 +02:00
Victor Stinner
202b62bd90
PyUnicode_FromKindAndData() raises a ValueError if the kind is unknown
2011-10-01 23:48:37 +02:00
Victor Stinner
07ac3ebd7b
Optimize unicode_subtype_new(): don't encode to wchar_t and decode from wchar_t
...
Rewrite unicode_subtype_new(): allocate directly the right type.
2011-10-01 16:16:43 +02:00
Victor Stinner
e90fe6a8f4
Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros
...
* Rename existing _PyUnicode_UTF8() macro to PyUnicode_UTF8()
* Rename existing _PyUnicode_UTF8_LENGTH() macro to PyUnicode_UTF8_LENGTH()
* PyUnicode_UTF8() and PyUnicode_UTF8_LENGTH() are more strict
2011-10-01 16:48:13 +02:00
Martin v. Löwis
0b1d348990
Issue 13085: Fix some memory leaks. Patch by Stefan Krah.
2011-10-01 16:35:40 +02:00
Benjamin Peterson
5c0fb00ad8
merge heads
2011-10-01 00:12:20 -04:00
Benjamin Peterson
31616ea2ff
remove reference to non-existent file
2011-10-01 00:11:09 -04:00
Victor Stinner
de636f3c34
PyUnicode_Substring() now accepts end bigger than string length
...
Fix also a bug: call PyUnicode_READY() before reading string length.
2011-10-01 03:55:54 +02:00
Victor Stinner
c759f3e7ec
Ooops, avoid a division by zero in unicode_repeat()
2011-10-01 03:09:58 +02:00
Victor Stinner
d3a83d5eb3
PyUnicode_FromObject() ensures that its output is a ready string
2011-10-01 03:09:33 +02:00
Victor Stinner
67ca64ce54
I want a super fast 'a' * n!
...
* Optimize unicode_repeat() for a special case with memset()
* Simplify integer overflow checking; remove the second check because
PyUnicode_New() already does it and uses a smaller limit (Py_ssize_t vs
size_t)
2011-10-01 02:47:29 +02:00
Victor Stinner
e9a2935c1f
Fix usage of PyUnicode_READY in unicodeobject.c
2011-10-01 02:14:59 +02:00
Victor Stinner
12bab6dace
Remove private substring() function, reuse public PyUnicode_Substring()
...
* PyUnicode_Substring() now fails if start or end is invalid
* PyUnicode_Substring() reuses PyUnicode_Copy() for non-exact strings
2011-10-01 01:53:49 +02:00
Victor Stinner
c841e7db1f
Optimize PyUnicode_Copy(): don't recompute maximum character
2011-10-01 01:34:32 +02:00
Victor Stinner
2219e0a37e
PyUnicode_FromObject() reuses PyUnicode_Copy()
...
* PyUnicode_Copy() is faster than substring()
* Fix also a compiler warning
2011-10-01 01:16:59 +02:00
Victor Stinner
034f6cf10c
Add PyUnicode_Copy() function, include it to the public API
2011-09-30 02:26:44 +02:00
Victor Stinner
b153615008
PyUnicode_CopyCharacters() uses exceptions instead of assertions
...
Call PyErr_BadInternalCall() if inputs are not unicode strings.
2011-09-30 02:26:10 +02:00
Victor Stinner
d8f6510acc
_PyUnicode_Ready() cannot be used on ready strings anymore
...
* Change its prototype: PyObject* instead of PyUnicodeoObject*.
* Remove an old assertion, the result of PyUnicode_READY (_PyUnicode_Ready)
must be checked instead
2011-09-29 19:43:17 +02:00
Victor Stinner
bc8b81bc4e
Move _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() outside unicodeobject.h
...
Move these macros to unicodeobject.c
2011-09-29 19:31:34 +02:00
Victor Stinner
a0702ab1fe
Add a note in PyUnicode_CopyCharacters() doc: it doesn't write null character
...
Cleanup also the code (avoid the goto).
2011-09-29 14:14:38 +02:00
Victor Stinner
639418812f
Use the new Py_ARRAY_LENGTH macro
2011-09-29 00:42:28 +02:00
Victor Stinner
b9dcffb51e
Fix 'c' format of PyUnicode_Format()
...
formatbuf is now an array of Py_UCS4, not of Py_UNICODE
2011-09-29 00:39:24 +02:00
Victor Stinner
c17f540b7a
Oops, fix my previous commit: unicode => to
2011-09-29 00:16:58 +02:00
Victor Stinner
b15d4d899c
PyUnicode_CopyCharacters() marks the string as dirty (reset the hash)
2011-09-28 23:59:20 +02:00
Victor Stinner
f5ca1a21a5
PyUnicode_CopyCharacters() fails if 'to' has more than 1 reference
2011-09-28 23:54:59 +02:00
Ezio Melotti
2aa2b3b4d5
Clean up a few tabs that went in with PEP393.
2011-09-29 00:58:57 +03:00
Ezio Melotti
48a2f8fd97
#13054 : sys.maxunicode is now always 0x10FFFF.
2011-09-29 00:18:19 +03:00
Victor Stinner
506f592769
Check size of wchar_t using the preprocessor
2011-09-28 22:34:18 +02:00
Victor Stinner
73f01c65c8
PyUnicode_CopyCharacters() initializes overflow
2011-09-28 22:28:04 +02:00
Victor Stinner
e57b1c0da1
Mark PyUnicode_FromUCS[124] as private
2011-09-28 22:20:48 +02:00
Victor Stinner
ff9e50fd04
Oops, fix Py_MIN/Py_MAX case
2011-09-28 22:17:19 +02:00
Victor Stinner
17222160e7
Mark _PyUnicode_FindMaxCharAndNumSurrogatePairs() as private
2011-09-28 22:15:37 +02:00
Victor Stinner
157f83fcfc
Strip trailing spaces in unicodeobject.[ch]
2011-09-28 21:41:31 +02:00
Victor Stinner
6c7a52a46f
Check for PyUnicode_CopyCharacters() failure
2011-09-28 21:39:17 +02:00
Victor Stinner
be78eaf2de
PyUnicode_CopyCharacters() checks for buffer and character overflow
...
It now returns the number of written characters on success.
2011-09-28 21:37:03 +02:00
Victor Stinner
fb5f5f2420
Mark PyUnicode_CONVERT_BYTES as private
2011-09-28 21:39:49 +02:00
Georg Brandl
4cb0de246c
Rename new macros to conform to naming rules (function macros have "Py" prefix, not "PY").
2011-09-28 21:49:49 +02:00
Benjamin Peterson
9c6e6a0c7f
don't check that the first character is XID_Continue
...
Current, XID_Continue is a superset of XID_Start, but that may sometime change.
2011-09-28 08:09:05 -04:00
Martin v. Löwis
d63a3b8beb
Implement PEP 393.
2011-09-28 07:41:54 +02:00
Mark Dickinson
57e683e53e
Issue #1621 : Fix undefined behaviour in bytes.__hash__, str.__hash__, tuple.__hash__, frozenset.__hash__ and set indexing operations.
2011-09-24 18:18:40 +01:00
Mark Dickinson
0d5f6adbb3
Issue #13012 : Allow 'keepends' to be passed as a keyword argument in str.splitlines, bytes.splitlines and bytearray.splitlines.
2011-09-24 09:14:39 +01:00
Victor Stinner
f955eb210f
Merge 3.2: Fix PyUnicode_AsWideCharString() doc
...
- Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null
character
- Fix spelling of the null character
2011-09-06 02:01:29 +02:00
Victor Stinner
d88d9836c5
Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character
...
Fix also spelling of the null character.
2011-09-06 02:00:05 +02:00
Ezio Melotti
6f2a683a0c
#9200 : merge with 3.2.
2011-08-22 20:31:11 +03:00
Ezio Melotti
93e7afc5d9
#9200 : The str.is* methods now work with strings that contain non-BMP characters even in narrow Unicode builds.
2011-08-22 14:08:38 +03:00
Benjamin Peterson
e518d4c18a
merge 3.2
2011-08-18 13:52:19 -05:00
Benjamin Peterson
7a6b44ab62
the named of the character is actually NUL
2011-08-18 13:51:47 -05:00
Benjamin Peterson
020340f284
merge 3.2
2011-08-18 10:49:16 -05:00
Benjamin Peterson
5ad517a7d9
NUL -> NULL
2011-08-18 10:48:50 -05:00
Ezio Melotti
269e3ee3db
#12266 : merge with 3.2.
2011-08-15 09:26:28 +03:00
Ezio Melotti
ee8d998ecf
#12266 : Fix str.capitalize() to correctly uppercase/lowercase titlecased and cased non-letter characters.
2011-08-15 09:09:57 +03:00
Benjamin Peterson
f8e7543df9
merge 3.2 ( #12732 )
2011-08-12 22:18:19 -05:00
Benjamin Peterson
f413b80806
in narrow builds, make sure to test codepoints as identifier characters ( closes #12732 )
...
This fixes the use of Unicode identifiers outside the BMP in narrow builds.
2011-08-12 22:17:18 -05:00
Brian Curtin
dfc80e3d97
Replace Py_NotImplemented returns with the macro form Py_RETURN_NOTIMPLEMENTED.
...
The macro was introduced in #12724 .
2011-08-10 20:28:54 -05:00
Senthil Kumaran
fcdaaa9011
merge from 3.2 - Fix closes Issue12621 - Fix docstrings of find and rfind methods of bytes/bytearry/unicodeobject.
2011-07-27 23:34:29 +08:00
Senthil Kumaran
53516a82df
Fix closes Issue12621 - Fix docstrings of find and rfind methods of bytes/bytearry/unicodeobject.
2011-07-27 23:33:54 +08:00
Victor Stinner
99b9538636
Issue #9642 : Uniformize the tests on the availability of the mbcs codec
...
Add a new HAVE_MBCS define.
2011-07-04 14:23:54 +02:00
Senthil Kumaran
bc9d8f838b
merge from 3.2
2011-07-03 21:05:25 -07:00
Senthil Kumaran
9ebe08d2f6
Fix closes issue12471 - wrong TypeError message when '%i' format spec was used.
2011-07-03 21:03:16 -07:00
Victor Stinner
3cbf14bfb1
Issue #10914 : Initialize correctly the filesystem codec when creating a new
...
subinterpreter to fix a bootstrap issue with codecs implemented in Python, as
the ISO-8859-15 codec.
Add fscodec_initialized attribute to the PyInterpreterState structure.
2011-04-27 00:24:21 +02:00
Victor Stinner
793b531756
Issue #10914 : Initialize correctly the filesystem codec when creating a new
...
subinterpreter to fix a bootstrap issue with codecs implemented in Python, as
the ISO-8859-15 codec.
Add fscodec_initialized attribute to the PyInterpreterState structure.
2011-04-27 00:24:21 +02:00
Ezio Melotti
bf1253b25a
#6780 : merge with 3.2.
2011-04-26 06:45:24 +03:00
Ezio Melotti
f2b3f780a1
#6780 : merge with 3.1.
2011-04-26 06:40:59 +03:00
Ezio Melotti
ba42fd5801
#6780 : fix starts/endswith error message to mention that tuples are accepted too.
2011-04-26 06:09:45 +03:00
Jesus Cea
c1ceb64e41
MERGE: startswith and endswith don't accept None as slice index. Patch by Torsten Becker. ( closes #11828 )
2011-04-20 17:59:29 +02:00
Jesus Cea
6159ee3cf5
MERGE: startswith and endswith don't accept None as slice index. Patch by Torsten Becker. ( closes #11828 )
2011-04-20 17:42:50 +02:00
Jesus Cea
ac4515063c
startswith and endswith don't accept None as slice index. Patch by Torsten Becker. ( closes #11828 )
2011-04-20 17:09:23 +02:00
Benjamin Peterson
5fd4bd3796
avoid casting with this nice macro
2011-03-06 09:06:34 -06:00
Victor Stinner
2f283c2c19
Fix my previous commit (r88709) for str.encode(errors=...)
2011-03-02 01:21:46 +00:00
Victor Stinner
a5c68c3cb7
Issue #8923 : cache str.encode() result
...
When a string is encoded to UTF-8 in strict mode, the result is cached into the
object. Examples: str.encode(), str.encode('utf-8'), PyUnicode_AsUTF8String()
and PyUnicode_AsEncodedString(unicode, "utf-8", NULL).
2011-03-02 01:03:14 +00:00
Victor Stinner
f3fd733f92
Remove useless argument of _PyUnicode_AsDefaultEncodedString()
2011-03-02 01:03:11 +00:00
Victor Stinner
6d970f4713
Issue #10831 : PyUnicode_FromFormat() supports %li, %lli and %zi formats
2011-03-02 00:04:25 +00:00
Victor Stinner
e7faec1aa9
Fix my previous commit (r88702): initialize size_tflag in parse_format_flags()
2011-03-02 00:01:53 +00:00
Victor Stinner
968654515f
Issue #10829 : Refactor PyUnicode_FromFormat()
...
* Use the same function to parse the format string in the 3 steps
* Fix crashs on invalid format strings
2011-03-01 23:44:09 +00:00
Victor Stinner
2b574a2332
Merged revisions 88697 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r88697 | victor.stinner | 2011-03-01 23:46:52 +0100 (mar., 01 mars 2011) | 4 lines
Issue #11246 : Fix PyUnicode_FromFormat("%V")
Decode the byte string from UTF-8 (with replace error handler) instead of
ISO-8859-1 (in strict mode). Patch written by Ray Allen.
........
2011-03-01 22:48:49 +00:00
Victor Stinner
2512a8b62e
Issue #11246 : Fix PyUnicode_FromFormat("%V")
...
Decode the byte string from UTF-8 (with replace error handler) instead of
ISO-8859-1 (in strict mode). Patch written by Ray Allen.
2011-03-01 22:46:52 +00:00
Alexander Belopolsky
4001847a98
PEP 7 conformance changes (whitespace only).
2011-02-26 01:02:56 +00:00
Alexander Belopolsky
1d52146a25
Issue #11303 : Added shortcuts for utf8 and latin1 encodings.
...
Documented the list of optimized encodings as CPython implementation
detail.
2011-02-25 19:19:57 +00:00
Victor Stinner
659eb84457
Merged revisions 88481 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r88481 | victor.stinner | 2011-02-21 22:13:44 +0100 (lun., 21 févr. 2011) | 4 lines
Fix PyUnicode_FromFormatV("%c") for non-BMP char
Issue #10830 : Fix PyUnicode_FromFormatV("%c") for non-BMP characters on
narrow build.
........
2011-02-23 12:14:22 +00:00
Brett Cannon
b94767ff44
Issue #8914 : fix various warnings from the Clang static analyzer v254.
2011-02-22 20:15:44 +00:00
Victor Stinner
5ed8b2c737
Fix PyUnicode_FromFormatV("%c") for non-BMP char
...
Issue #10830 : Fix PyUnicode_FromFormatV("%c") for non-BMP characters on
narrow build.
2011-02-21 21:13:44 +00:00