Antoine Pitrou
6d5ad227a5
Issue #16215 : Fix potential double memory free in str.replace().
...
Patch by Serhiy Storchaka.
2012-11-17 23:28:17 +01:00
Victor Stinner
0d92c4f667
Issue #16416 : Fix error handling in _Py_wchar2char() _Py_char2wchar() functions
2012-11-12 23:32:21 +01:00
Victor Stinner
fc009eff9e
Close #16311 : Use the _PyUnicodeWriter API in text decoders
...
* Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare()
* Remove unicode_putchar(): replaced with
PyUnicodeWriter_Prepare() + PyUnicode_WRITER()
* When handling an decoding error, only overallocate the buffer by +25%
instead of +100%
2012-11-07 00:36:38 +01:00
Ezio Melotti
cfa9636404
#8271 : merge with 3.3.
2012-11-04 23:23:09 +02:00
Ezio Melotti
f7ed5d111b
#8271 : the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.
2012-11-04 23:21:38 +02:00
Benjamin Peterson
7ff2094bc7
merge 3.3 ( #16369 )
2012-10-30 23:31:12 -04:00
Benjamin Peterson
e8ea97fffb
merge 3.2 ( #16369 )
2012-10-30 23:27:52 -04:00
Benjamin Peterson
c43112823b
initialize more global type objects ( closes #16369 )
2012-10-30 23:21:10 -04:00
Victor Stinner
e64322e034
Close #14625 : Rewrite the UTF-32 decoder. It is now 3x to 4x faster
...
Patch written by Serhiy Storchaka.
2012-10-30 23:12:47 +01:00
Victor Stinner
76df43de30
Issue #16330 : Use surrogate-related macros
...
Patch written by Serhiy Storchaka.
2012-10-30 01:42:39 +01:00
Mark Dickinson
fb90c0934c
Issue #14700 : Fix buggy overflow checks for large precision and width in new-style and old-style formatting.
2012-10-28 10:18:03 +00:00
Victor Stinner
c6cf1ba29e
Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in resize_copy()
2012-10-23 02:54:47 +02:00
Victor Stinner
fe75fb4b3e
Optimize _PyUnicode_HasNULChars(): use findchar() instead of PyUnicode_Contains()
2012-10-23 02:52:18 +02:00
Victor Stinner
6fa627578a
Inline raise_translate_exception(): it is only used once
2012-10-23 02:51:50 +02:00
Victor Stinner
e5567ad236
Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp()
2012-10-23 02:48:49 +02:00
Christian Heimes
743e0cd6b5
Issue #16166 : Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
...
endianess detection and handling.
2012-10-17 23:52:17 +02:00
Chris Jerdonek
4a7df9aba9
Issue #14783 : Merge changes from 3.3.
2012-10-07 15:02:16 -07:00
Chris Jerdonek
042fa653ab
Issue #14783 : Merge changes from 3.2.
2012-10-07 14:56:27 -07:00
Chris Jerdonek
83fe2e1c22
Issue #14783 : Improve int() docstring and also str(), range(), and slice().
...
This commit rewrites the docstring for int() to incorporate the documentation
changes made in issue #16036 . It also switches the docstrings for int(),
str(), range(), and slice() to use multi-line signatures.
2012-10-07 14:48:36 -07:00
Victor Stinner
4c63a972d1
Cleanup PyUnicode_FromFormatV() for zero padding
...
Skip the "0" instead of parsing it twice: detect zero padding and then parsed
as a digit of the width.
2012-10-06 23:55:33 +02:00
Victor Stinner
15a1136547
Issue #16147 : PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
...
on the heap to format numbers.
2012-10-06 23:48:20 +02:00
Victor Stinner
ff5a848db5
Issue #16147 : PyUnicode_FromFormatV() now raises an error if the argument of
...
'%c' is not in the range(0x110000).
2012-10-06 23:05:45 +02:00
Victor Stinner
3921e90c5a
Issue #16147 : PyUnicode_FromFormatV() now detects integer overflow when parsing
...
width and precision
2012-10-06 23:05:00 +02:00
Victor Stinner
e215d960be
Issue #16147 : Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
...
* Simplify the code: replace 4 steps with one unique step using the
_PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to
store intermediate results which require to allocate an array of pointers on
the heap.
* Use the _PyUnicodeWriter API for speed (and its convinient API):
overallocate the buffer to reduce the number of "realloc()"
* Implement "width" and "precision" in Python, don't rely on sprintf(). It
avoids to need of a temporary buffer allocated on the heap: only use a small
buffer allocated in the stack.
* Add _PyUnicodeWriter_WriteCstr() function
* Split PyUnicode_FromFormatV() into two functions: add
unicode_fromformat_arg().
* Inline parse_format_flags(): the format of an argument is now only parsed
once, it's no more needed to have a subfunction.
* Optimize PyUnicode_FromFormatV() for characters between two "%" arguments:
search the next "%" and copy the substring in one chunk, instead of copying
character per character.
2012-10-06 23:03:36 +02:00
Mark Dickinson
ff9c54aca2
Issue #16096 : Merge fixes from 3.3.
2012-10-06 18:05:14 +01:00
Mark Dickinson
c04ddff290
Issue #16096 : Fix several occurrences of potential signed integer overflow. Thanks Serhiy Storchaka.
2012-10-06 18:04:49 +01:00
Victor Stinner
8c6db45d3e
In debug mode, unicode_write_cstr() now checks that non-ASCII characters are
...
not written into an ASCII string
2012-10-06 00:40:45 +02:00
Ezio Melotti
080a2c087e
#16127 : merge with 3.3.
2012-10-05 03:34:02 +03:00
Ezio Melotti
e7f90375b1
#16127 : remove outdated references to narrow builds. Patch by Serhiy Storchaka.
2012-10-05 03:33:31 +03:00
Victor Stinner
1929407406
Fix PyUnicode_Format(): return NULL if PyUnicode_READY(uformat) failed
...
This error cannot occur in practice: PyUnicode_FromObject() always return
a "ready" string.
2012-10-05 00:09:33 +02:00
Victor Stinner
770e19e0cc
Optimize unicode_compare(): use memcmp() when comparing two UCS1 strings
2012-10-04 22:59:45 +02:00
Victor Stinner
90db9c47dc
Enable also ptr==ptr optimization in PyUnicode_Compare()
...
It was already implemented in PyUnicode_RichCompare()
2012-10-04 21:53:50 +02:00
Victor Stinner
aa7712711d
unicode_result_wchar(): move the assert() to the "#ifdef Py_DEBUG" block
2012-10-04 02:32:58 +02:00
Victor Stinner
a4708231e6
Split the huge PyUnicode_Format() function (+540 lines) into subfunctions
2012-10-04 02:19:54 +02:00
Victor Stinner
a049443fab
PyUnicode_Format(): disable overallocation when we are writing the last part
...
of the output string
2012-10-03 23:03:46 +02:00
Victor Stinner
afffce489b
Unicode: resize_compact() and resize_inplace() fills also the Unicode strings
...
with invalid bytes in debug mode, as done by PyUnicode_New()
2012-10-03 23:03:17 +02:00
Victor Stinner
c89d28fdfc
Issue #15609 : Fix refleak introduced by my last optimization
2012-10-02 12:54:07 +02:00
Victor Stinner
621ef3d84f
Issue #15609 : Optimize str%args for integer argument
...
- Use _PyLong_FormatWriter() instead of formatlong() when possible, to avoid
a temporary buffer
- Enable the fast path when width is smaller or equals to the length,
and when the precision is bigger or equals to the length
- Add unit tests!
- formatlong() uses PyUnicode_Resize() instead of _PyUnicode_FromASCII()
to resize the output string
2012-10-02 00:33:47 +02:00
Antoine Pitrou
a1f7655fa7
Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
...
Patch by Serhiy Storchaka.
2012-09-23 20:00:04 +02:00
Antoine Pitrou
6f80f5d444
Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
...
Patch by Serhiy Storchaka.
2012-09-23 19:55:21 +02:00
Antoine Pitrou
ca8aa4acf6
Issue #15144 : Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
...
Patch by Serhiy Storchaka.
2012-09-20 20:56:47 +02:00
Christian Heimes
5f520f4fed
Issue #15900 : Fixed reference leak in PyUnicode_TranslateCharmap()
2012-09-11 14:03:25 +02:00
Christian Heimes
f4f9939a96
Fixed memory leak in error branch of formatfloat(). CID 719687
2012-09-10 11:48:41 +02:00
Antoine Pitrou
057119b0b7
Fix C++-style comment (xlc compilation failure)
2012-09-02 17:56:33 +02:00
Benjamin Peterson
59043f96ea
merge 3.2 ( #15801 )
2012-08-28 18:01:45 -04:00
Benjamin Peterson
28a6cfaefc
use the stricter PyMapping_Check ( closes #15801 )
2012-08-28 17:55:35 -04:00
Stefan Krah
8528c3145e
Issue #15728 : Fix leak in PyUnicode_AsWideCharString(). Found by Coverity.
2012-08-19 21:52:43 +02:00
Nick Coghlan
0e41628d35
Merge str docstring fix from 3.2
2012-08-16 14:14:30 +10:00
Nick Coghlan
573b1fd779
Fix str docstring
2012-08-16 14:13:07 +10:00
Antoine Pitrou
b4bbee25b1
Issue #14579 : Fix CVE-2012-2135: vulnerability in the utf-16 decoder after error handling.
...
Patch by Serhiy Storchaka.
2012-07-21 00:45:14 +02:00
Mark Dickinson
01ac8b6ab1
Use correct types for ASCII_CHAR_MASK integer constants.
2012-07-07 14:08:48 +02:00
Antoine Pitrou
aaefac76dd
Issue #14874 : Restore charmap decoding speed to pre-PEP 393 levels.
...
Patch by Serhiy Storchaka.
2012-06-16 22:48:21 +02:00
Victor Stinner
f185226244
_copy_characters(): move debug code at the top to avoid noisy #ifdef
...
And don't use assert() anymore if check_maxchar is set: return -1 on error
instead.
2012-06-16 16:38:26 +02:00
Victor Stinner
07621338fb
Fix PyUnicode_GetSize(): Don't replace _PyUnicode_Ready() exception
2012-06-16 04:53:46 +02:00
Victor Stinner
8a8b3eaabe
Fix a compiler warning in _copy_characters() and remove debug code
2012-06-16 04:53:25 +02:00
Victor Stinner
24e403bbee
Oops, fix my previous change on _copy_characters()
2012-06-16 04:53:00 +02:00
Victor Stinner
ca439eecea
Fix unicode_adjust_maxchar(): catch PyUnicode_New() failure
2012-06-16 03:17:34 +02:00
Victor Stinner
184252ad3f
Fix "%f" format of str%args if the result is not an ASCII or latin1 string
2012-06-16 02:57:41 +02:00
Victor Stinner
9a77770add
Remove debug code
2012-06-16 02:44:43 +02:00
Victor Stinner
c9d369f1bf
Optimize _PyUnicode_FastCopyCharacters() when maxchar(from) > maxchar(to)
2012-06-16 02:22:37 +02:00
Victor Stinner
f05e17ece9
unicodeobject.c: Remove debug code
2012-06-16 01:53:04 +02:00
Antoine Pitrou
27f6a3b0bf
Issue #15026 : utf-16 encoding is now significantly faster (up to 10x).
...
Patch by Serhiy Storchaka.
2012-06-15 22:15:23 +02:00
Kristján Valur Jónsson
55e5dc8371
Rearrange code to beat an optimizer bug affecting Release x64 on windows
...
with VS2010sp1
2012-06-06 21:58:08 +00:00
Victor Stinner
d7b7c7472b
Issue #14993 : Use standard "unsigned char" instead of a unsigned char bitfield
2012-06-04 22:52:12 +02:00
Kristjan Valur Jonsson
85634d7a2e
Issue #14909 : A number of places were using PyMem_Realloc() apis and
...
PyObject_GC_Resize() with incorrect error handling. In case of errors,
the original object would be leaked. This checkin fixes those cases.
2012-05-31 09:37:31 +00:00
Victor Stinner
3a7d096f2f
Issue #14744 : Fix compilation on Windows (part 2)
2012-05-29 18:53:56 +02:00
Victor Stinner
d3f0882dfb
Issue #14744 : Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args)
...
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
avoids a temporary buffer in most cases.
* Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
keep a reference to the string if the output is only composed of one string
* Disable overallocation when formatting the last argument of str%args and
str.format(args)
* Overallocation allocates at least 100 characters: add min_length attribute
to the _PyUnicodeWriter structure
* Add new private functions: _PyUnicode_FastCopyCharacters(),
_PyUnicode_FastFill() and _PyUnicode_FromASCII()
The speed up is around 20% in average.
2012-05-29 12:57:52 +02:00
Antoine Pitrou
63065d761e
Issue #14624 : UTF-16 decoding is now 3x to 4x faster on various inputs.
...
Patch by Serhiy Storchaka.
2012-05-15 23:48:04 +02:00
Martin v. Löwis
b05c0738d8
Silence VS 2010 signed/unsigned warnings.
2012-05-15 13:45:49 +02:00
Antoine Pitrou
758153badb
Fix refleaks introduced by 83da67651687.
2012-05-12 15:51:51 +02:00
Antoine Pitrou
e45c0c5cef
Fix logic error introduced by 83da67651687.
2012-05-12 15:49:07 +02:00
Benjamin Peterson
1ff2e35e84
simplify by shortcutting when the kind of the needle is larger than the haystack
2012-05-11 17:41:20 -05:00
Antoine Pitrou
ca5f91b888
Issue #14738 : Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy Storchaka.
2012-05-10 16:36:02 +02:00
Victor Stinner
3b1a74a9c3
Rename unicode_write_t structure and its methods to "_PyUnicodeWriter"
2012-05-09 22:25:00 +02:00
Victor Stinner
ee4544c920
Issue #14744 : Inline unicode_writer_write_char() and unicode_write_str()
...
Optimize also PyUnicode_Format(): call unicode_writer_prepare() only once
per argument.
2012-05-09 22:24:08 +02:00
Victor Stinner
f59c28c930
unicode_writer_finish() checks string consistency
2012-05-09 03:24:14 +02:00
Victor Stinner
106802547c
Backout ab500b297900: the check for integer overflow is wrong
...
Issue #14716 : Change integer overflow check in unicode_writer_prepare()
to compute the limit at compile time instead of runtime. Patch writen by Serhiy
Storchaka.
2012-05-07 23:50:05 +02:00
Victor Stinner
0576f9b4cf
Issue #14716 : Change integer overflow check in unicode_writer_prepare()
...
to compute the limit at compile time instead of runtime. Patch writen by Serhiy
Storchaka.
2012-05-07 13:02:44 +02:00
Victor Stinner
202fdca133
Close #14716 : str.format() now uses the new "unicode writer" API instead of the
...
PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.
2012-05-07 12:47:02 +02:00
Mark Dickinson
99e2e5552a
Issue #14700 : Fix two broken and undefined-behaviour-inducing overflow checks in old-style string formatting. Thanks Serhiy Storchaka for report and original patch.
2012-05-07 11:20:50 +01:00
Victor Stinner
d0dba6eee8
unicode_writer: don't force inline when it is not necessary
...
Keep inline for performance critical functions (functions used in loops)
2012-05-04 01:19:15 +02:00
Benjamin Peterson
b63f49f2b4
if the kind of the string to count is larger than the string to search, shortcut to 0
2012-05-03 18:31:07 -04:00
Victor Stinner
a7b654be30
unicode_writer: add finish() method and assertions to write_str() method
...
* The write_str() method does nothing if the length is zero.
* Replace "struct unicode_writer_t" with "unicode_writer_t"
2012-05-03 23:58:55 +02:00
Victor Stinner
bf4e266397
Issue #14687 : Remove redundant length attribute of unicode_write_t
...
The length can be read directly from the buffer
2012-05-03 19:27:14 +02:00
Victor Stinner
7989157e49
Issue #14687 : Cleanup unicode_writer_prepare()
...
"Inline" PyUnicode_Resize(): call directly resize_compact()
2012-05-03 13:43:07 +02:00
Victor Stinner
f2c76aa6cb
Issue #14687 : str%tuple now uses an optimistic "unicode writer" instead of an
...
accumulator. Directly write characters into the output (don't use a temporary
list): resize and widen the string on demand.
2012-05-03 13:10:40 +02:00
Victor Stinner
1b487b467b
Issue #14624 , #14687 : Optimize unicode_widen()
...
Don't convert uninitialized characters. Patch written by Serhiy Storchaka.
2012-05-03 12:29:04 +02:00
Victor Stinner
3a7f7977f1
Remove buggy assertion in PyUnicode_Substring()
...
Use also directly unicode_empty, instead of PyUnicode_New(0,0).
2012-05-03 03:36:40 +02:00
Victor Stinner
684d5fd420
Fix PyUnicode_Substring() for start >= length and start > end
...
Remove the fast-path for 1-character string: unicode_fromascii() and
_PyUnicode_FromUCS*() now have their own fast-path for 1-character strings.
2012-05-03 02:32:34 +02:00
Victor Stinner
b6cd014d75
Unicode: optimize creating of 1-character strings
2012-05-03 02:17:04 +02:00
Victor Stinner
bff7c96834
Issue #14687 : Optimize str%tuple for the "%(name)s" syntax
...
Avoid an useless and expensive call to PyUnicode_READ().
2012-05-03 01:44:59 +02:00
Victor Stinner
e6abb488c9
unicodeobject.c: Add MAX_MAXCHAR() macro to (micro-)optimize the computation
...
of the second argument of PyUnicode_New().
* Create also align_maxchar() function
* Optimize fix_decimal_and_space_to_ascii(): don't compute the maximum
character when ch <= 127 (it is ASCII)
2012-05-02 01:15:40 +02:00
Victor Stinner
438106b66e
Issue #14687 : Cleanup PyUnicode_Format()
2012-05-02 00:41:57 +02:00
Victor Stinner
b5c3ea3af3
Issue #14687 : Optimize str%args
...
* formatfloat() uses unicode_fromascii() instead of PyUnicode_DecodeASCII()
to not have to check characters, we know that it is really ASCII
* Use PyUnicode_FromOrdinal() instead of _PyUnicode_FromUCS4() to format
a character: if avoids a call to ucs4lib_find_max_char() to compute
the maximum character (whereas we already know it, it is just the character
itself)
2012-05-02 00:29:36 +02:00
Victor Stinner
b80e46eca4
Issue #14687 : Avoid an useless duplicated string in PyUnicode_Format()
2012-04-30 05:21:52 +02:00
Victor Stinner
aff3cc659b
Issue #14687 : Cleanup PyUnicode_Format()
2012-04-30 05:19:21 +02:00
Victor Stinner
b11d91d969
Fix my previous commit: bool is a long, restore the specical case for bool
2012-04-28 00:25:34 +02:00
Victor Stinner
d0880d57b0
Simplify and optimize formatlong()
...
* Remove _PyBytes_FormatLong(): inline it into formatlong()
* the input type is always a long, so remove the code for bool
* don't duplicate the string if the length does not change
* Use PyUnicode_DATA() instead of _PyUnicode_AsString()
2012-04-27 23:40:13 +02:00
Victor Stinner
94d558b063
Optimize _PyUnicode_FindMaxChar() find pure ASCII strings
2012-04-27 22:26:58 +02:00
Victor Stinner
8f825060f1
Check newly created consistency using _PyUnicode_CheckConsistency(str, 1)
...
* In debug mode, fill the string data with invalid characters
* Simplify also reference counting in PyCodec_BackslashReplaceErrors()
and PyCodec_XMLCharRefReplaceError()
2012-04-27 13:55:39 +02:00