cpython

Commit Graph

Author	SHA1	Message	Date
Serhiy Storchaka	570c5b2354	Issue #16980 : Fix processing of escaped non-ascii bytes in the unicode-escape-decode decoder.	2013-01-25 23:53:29 +02:00
Serhiy Storchaka	73e38809e0	Issue #16980 : Fix processing of escaped non-ascii bytes in the unicode-escape-decode decoder.	2013-01-25 23:52:21 +02:00
Serhiy Storchaka	6481bfb2b5	Issue #16335 : Fix integer overflow in unicode-escape decoder.	2013-01-21 11:44:40 +02:00
Serhiy Storchaka	c35f3a9f61	Issue #16335 : Fix integer overflow in unicode-escape decoder.	2013-01-21 11:42:57 +02:00
Serhiy Storchaka	4f5f0e54e0	Issue #16335 : Fix integer overflow in unicode-escape decoder.	2013-01-21 11:38:00 +02:00
Serhiy Storchaka	441d30fac7	Issue #15989 : Fix several occurrences of integer overflow when result of PyLong_AsLong() narrowed to int without checks. This is a backport of changesets 13e2e44db99d and 525407d89277.	2013-01-19 12:26:26 +02:00
Serhiy Storchaka	9101e23ff6	Issue #15989 : Fix several occurrences of integer overflow when result of PyLong_AsLong() narrowed to int without checks. This is a backport of changesets 13e2e44db99d and 525407d89277.	2013-01-19 12:41:45 +02:00
Serhiy Storchaka	55e2cb497b	Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping" in any mapping, not only in an unicode string.	2013-01-15 15:30:04 +02:00
Serhiy Storchaka	45d16d9924	Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping" in any mapping, not only in an unicode string.	2013-01-15 15:01:20 +02:00
Serhiy Storchaka	4fb8caee87	Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping" in any mapping, not only in an unicode string.	2013-01-15 14:43:21 +02:00
Serhiy Storchaka	7898043868	Issue #15989 : Fix several occurrences of integer overflow when result of PyLong_AsLong() narrowed to int without checks.	2013-01-15 01:12:17 +02:00
Benjamin Peterson	0b32a480bd	merge 3.3 (#16906 )	2013-01-09 09:52:22 -06:00
Benjamin Peterson	0c270a8bb7	correct static string clearing loop (closes #16906 )	2013-01-09 09:52:01 -06:00
Serhiy Storchaka	24a3ef6999	Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP characters.	2013-01-08 23:41:55 +02:00
Serhiy Storchaka	ae3b32ad6b	Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP characters.	2013-01-08 23:40:52 +02:00
Serhiy Storchaka	48e188e573	Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP characters.	2013-01-08 23:14:24 +02:00
Serhiy Storchaka	dec798eb46	Fix out of bound read in UTF-32 decoder on "narrow Unicode" builds.	2013-01-08 22:45:42 +02:00
Serhiy Storchaka	4e02538bf3	Issue #16856 : Fix a segmentation fault from calling repr() on a dict with a key whose repr raise an exception.	2013-01-04 12:40:35 +02:00
Serhiy Storchaka	6c83e739d7	Issue #16856 : Fix a segmentation fault from calling repr() on a dict with a key whose repr raise an exception.	2013-01-04 12:39:34 +02:00
Victor Stinner	18aa4477d3	Close #16281 : handle tailmatch() failure and remove useless comment "honor direction and do a forward or backwards search": the runtime speed may be different, but I consider that it doesn't really matter in practice. The direction was never honored before: Python 2.7 uses memcmp() for the str type for example.	2013-01-03 03:18:09 +01:00
Victor Stinner	7ae320d667	(Merge 3.2) Issue #16455 : On FreeBSD and Solaris, if the locale is C, the ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.	2013-01-03 01:21:07 +01:00
Victor Stinner	20b654acb5	Issue #16455 : On FreeBSD and Solaris, if the locale is C, the ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.	2013-01-03 01:08:58 +01:00
Andrew Svetlov	2606a6f197	Issue #16719 : Get rid of WindowsError. Use OSError instead Patch by Serhiy Storchaka.	2012-12-19 14:33:35 +02:00
Gregory P. Smith	27dc02e8c5	Fix the internals of our hash functions to used unsigned values during hash computation as the overflow behavior of signed integers is undefined. NOTE: This change is smaller compared to 3.2 as much of this cleanup had already been done. I added the comment that my change in 3.2 added so that the code would match up. Otherwise this just adds or synchronizes appropriate UL designations on some constants to be pedantic. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. We could work to get rid of the -fwrapv requirement in 3.4 but that requires more planning. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.	2012-12-10 19:51:29 -08:00
Gregory P. Smith	c2176e46d7	Fix the internals of our hash functions to used unsigned values during hash computation as the overflow behavior of signed integers is undefined. NOTE: This change is smaller compared to 3.2 as much of this cleanup had already been done. I added the comment that my change in 3.2 added so that the code would match up. Otherwise this just adds or synchronizes appropriate UL designations on some constants to be pedantic. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.	2012-12-10 18:32:53 -08:00
Gregory P. Smith	27cbcd6241	Fix the internals of our hash functions to used unsigned values during hash computation as the overflow behavior of signed integers is undefined. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.	2012-12-10 18:15:46 -08:00
Victor Stinner	8dbd421b4d	Cleanup unicodeobject.c * Remove micro-optization: (errors == "surrogateescape" \|\| strcmp(errors, "surrogateescape") == 0). Only use strcmp() * Initialize 'arg' members in unicode_format_arg() to help the compiler to diagnose real bugs and also make the code simpler to read	2012-12-04 09:30:24 +01:00
Victor Stinner	d45c7f8d74	Issue #16455 : On FreeBSD and Solaris, if the locale is C, the ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.	2012-12-04 01:34:47 +01:00
Victor Stinner	2660e427d1	(Merge 3.2) Issue #16416 : On Mac OS X, operating system data are now always encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding (which may be ASCII if no locale environment variable is set), to avoid inconsistencies with os.fsencode() and os.fsdecode() functions which are already using UTF-8/surrogateescape.	2012-12-03 12:48:53 +01:00
Victor Stinner	27b1ca29cc	Issue #16416 : On Mac OS X, operating system data are now always encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding (which may be ASCII if no locale environment variable is set), to avoid inconsistencies with os.fsencode() and os.fsdecode() functions which are already using UTF-8/surrogateescape.	2012-12-03 12:47:59 +01:00
Antoine Pitrou	5439458a2a	Issue #16215 : Fix potential double memory free in str.replace(). Patch by Serhiy Storchaka.	2012-11-17 23:29:28 +01:00
Antoine Pitrou	6d5ad227a5	Issue #16215 : Fix potential double memory free in str.replace(). Patch by Serhiy Storchaka.	2012-11-17 23:28:17 +01:00
Victor Stinner	0d92c4f667	Issue #16416 : Fix error handling in _Py_wchar2char() _Py_char2wchar() functions	2012-11-12 23:32:21 +01:00
Victor Stinner	fc009eff9e	Close #16311 : Use the _PyUnicodeWriter API in text decoders * Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare() * Remove unicode_putchar(): replaced with PyUnicodeWriter_Prepare() + PyUnicode_WRITER() * When handling an decoding error, only overallocate the buffer by +25% instead of +100%	2012-11-07 00:36:38 +01:00
Ezio Melotti	cfa9636404	#8271 : merge with 3.3.	2012-11-04 23:23:09 +02:00
Ezio Melotti	f7ed5d111b	#8271 : the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.	2012-11-04 23:21:38 +02:00
Benjamin Peterson	7ff2094bc7	merge 3.3 (#16369 )	2012-10-30 23:31:12 -04:00
Benjamin Peterson	e8ea97fffb	merge 3.2 (#16369 )	2012-10-30 23:27:52 -04:00
Benjamin Peterson	c43112823b	initialize more global type objects (closes #16369 )	2012-10-30 23:21:10 -04:00
Victor Stinner	e64322e034	Close #14625 : Rewrite the UTF-32 decoder. It is now 3x to 4x faster Patch written by Serhiy Storchaka.	2012-10-30 23:12:47 +01:00
Victor Stinner	76df43de30	Issue #16330 : Use surrogate-related macros Patch written by Serhiy Storchaka.	2012-10-30 01:42:39 +01:00
Mark Dickinson	fb90c0934c	Issue #14700 : Fix buggy overflow checks for large precision and width in new-style and old-style formatting.	2012-10-28 10:18:03 +00:00
Victor Stinner	c6cf1ba29e	Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in resize_copy()	2012-10-23 02:54:47 +02:00
Victor Stinner	fe75fb4b3e	Optimize _PyUnicode_HasNULChars(): use findchar() instead of PyUnicode_Contains()	2012-10-23 02:52:18 +02:00
Victor Stinner	6fa627578a	Inline raise_translate_exception(): it is only used once	2012-10-23 02:51:50 +02:00
Victor Stinner	e5567ad236	Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp()	2012-10-23 02:48:49 +02:00
Christian Heimes	743e0cd6b5	Issue #16166 : Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified endianess detection and handling.	2012-10-17 23:52:17 +02:00
Chris Jerdonek	4a7df9aba9	Issue #14783 : Merge changes from 3.3.	2012-10-07 15:02:16 -07:00
Chris Jerdonek	042fa653ab	Issue #14783 : Merge changes from 3.2.	2012-10-07 14:56:27 -07:00
Chris Jerdonek	83fe2e1c22	Issue #14783 : Improve int() docstring and also str(), range(), and slice(). This commit rewrites the docstring for int() to incorporate the documentation changes made in issue #16036. It also switches the docstrings for int(), str(), range(), and slice() to use multi-line signatures.	2012-10-07 14:48:36 -07:00
Victor Stinner	4c63a972d1	Cleanup PyUnicode_FromFormatV() for zero padding Skip the "0" instead of parsing it twice: detect zero padding and then parsed as a digit of the width.	2012-10-06 23:55:33 +02:00
Victor Stinner	15a1136547	Issue #16147 : PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer on the heap to format numbers.	2012-10-06 23:48:20 +02:00
Victor Stinner	ff5a848db5	Issue #16147 : PyUnicode_FromFormatV() now raises an error if the argument of '%c' is not in the range(0x110000).	2012-10-06 23:05:45 +02:00
Victor Stinner	3921e90c5a	Issue #16147 : PyUnicode_FromFormatV() now detects integer overflow when parsing width and precision	2012-10-06 23:05:00 +02:00
Victor Stinner	e215d960be	Issue #16147 : Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API * Simplify the code: replace 4 steps with one unique step using the _PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers on the heap. * Use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()" * Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack. * Add _PyUnicodeWriter_WriteCstr() function * Split PyUnicode_FromFormatV() into two functions: add unicode_fromformat_arg(). * Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction. * Optimize PyUnicode_FromFormatV() for characters between two "%" arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character.	2012-10-06 23:03:36 +02:00
Mark Dickinson	ff9c54aca2	Issue #16096 : Merge fixes from 3.3.	2012-10-06 18:05:14 +01:00
Mark Dickinson	c04ddff290	Issue #16096 : Fix several occurrences of potential signed integer overflow. Thanks Serhiy Storchaka.	2012-10-06 18:04:49 +01:00
Victor Stinner	8c6db45d3e	In debug mode, unicode_write_cstr() now checks that non-ASCII characters are not written into an ASCII string	2012-10-06 00:40:45 +02:00
Ezio Melotti	080a2c087e	#16127 : merge with 3.3.	2012-10-05 03:34:02 +03:00
Ezio Melotti	e7f90375b1	#16127 : remove outdated references to narrow builds. Patch by Serhiy Storchaka.	2012-10-05 03:33:31 +03:00
Victor Stinner	1929407406	Fix PyUnicode_Format(): return NULL if PyUnicode_READY(uformat) failed This error cannot occur in practice: PyUnicode_FromObject() always return a "ready" string.	2012-10-05 00:09:33 +02:00
Victor Stinner	770e19e0cc	Optimize unicode_compare(): use memcmp() when comparing two UCS1 strings	2012-10-04 22:59:45 +02:00
Victor Stinner	90db9c47dc	Enable also ptr==ptr optimization in PyUnicode_Compare() It was already implemented in PyUnicode_RichCompare()	2012-10-04 21:53:50 +02:00
Victor Stinner	aa7712711d	unicode_result_wchar(): move the assert() to the "#ifdef Py_DEBUG" block	2012-10-04 02:32:58 +02:00
Victor Stinner	a4708231e6	Split the huge PyUnicode_Format() function (+540 lines) into subfunctions	2012-10-04 02:19:54 +02:00
Victor Stinner	a049443fab	PyUnicode_Format(): disable overallocation when we are writing the last part of the output string	2012-10-03 23:03:46 +02:00
Victor Stinner	afffce489b	Unicode: resize_compact() and resize_inplace() fills also the Unicode strings with invalid bytes in debug mode, as done by PyUnicode_New()	2012-10-03 23:03:17 +02:00
Victor Stinner	c89d28fdfc	Issue #15609 : Fix refleak introduced by my last optimization	2012-10-02 12:54:07 +02:00
Victor Stinner	621ef3d84f	Issue #15609 : Optimize str%args for integer argument - Use _PyLong_FormatWriter() instead of formatlong() when possible, to avoid a temporary buffer - Enable the fast path when width is smaller or equals to the length, and when the precision is bigger or equals to the length - Add unit tests! - formatlong() uses PyUnicode_Resize() instead of _PyUnicode_FromASCII() to resize the output string	2012-10-02 00:33:47 +02:00
Antoine Pitrou	a1f7655fa7	Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). Patch by Serhiy Storchaka.	2012-09-23 20:00:04 +02:00
Antoine Pitrou	6f80f5d444	Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). Patch by Serhiy Storchaka.	2012-09-23 19:55:21 +02:00
Antoine Pitrou	ca8aa4acf6	Issue #15144 : Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t. Patch by Serhiy Storchaka.	2012-09-20 20:56:47 +02:00
Christian Heimes	5f520f4fed	Issue #15900 : Fixed reference leak in PyUnicode_TranslateCharmap()	2012-09-11 14:03:25 +02:00
Christian Heimes	f4f9939a96	Fixed memory leak in error branch of formatfloat(). CID 719687	2012-09-10 11:48:41 +02:00
Antoine Pitrou	057119b0b7	Fix C++-style comment (xlc compilation failure)	2012-09-02 17:56:33 +02:00
Benjamin Peterson	59043f96ea	merge 3.2 (#15801 )	2012-08-28 18:01:45 -04:00
Benjamin Peterson	28a6cfaefc	use the stricter PyMapping_Check (closes #15801 )	2012-08-28 17:55:35 -04:00
Stefan Krah	8528c3145e	Issue #15728 : Fix leak in PyUnicode_AsWideCharString(). Found by Coverity.	2012-08-19 21:52:43 +02:00
Nick Coghlan	0e41628d35	Merge str docstring fix from 3.2	2012-08-16 14:14:30 +10:00
Nick Coghlan	573b1fd779	Fix str docstring	2012-08-16 14:13:07 +10:00
Antoine Pitrou	b4bbee25b1	Issue #14579 : Fix CVE-2012-2135: vulnerability in the utf-16 decoder after error handling. Patch by Serhiy Storchaka.	2012-07-21 00:45:14 +02:00
Mark Dickinson	01ac8b6ab1	Use correct types for ASCII_CHAR_MASK integer constants.	2012-07-07 14:08:48 +02:00
Antoine Pitrou	aaefac76dd	Issue #14874 : Restore charmap decoding speed to pre-PEP 393 levels. Patch by Serhiy Storchaka.	2012-06-16 22:48:21 +02:00
Victor Stinner	f185226244	_copy_characters(): move debug code at the top to avoid noisy #ifdef And don't use assert() anymore if check_maxchar is set: return -1 on error instead.	2012-06-16 16:38:26 +02:00
Victor Stinner	07621338fb	Fix PyUnicode_GetSize(): Don't replace _PyUnicode_Ready() exception	2012-06-16 04:53:46 +02:00
Victor Stinner	8a8b3eaabe	Fix a compiler warning in _copy_characters() and remove debug code	2012-06-16 04:53:25 +02:00
Victor Stinner	24e403bbee	Oops, fix my previous change on _copy_characters()	2012-06-16 04:53:00 +02:00
Victor Stinner	ca439eecea	Fix unicode_adjust_maxchar(): catch PyUnicode_New() failure	2012-06-16 03:17:34 +02:00
Victor Stinner	184252ad3f	Fix "%f" format of str%args if the result is not an ASCII or latin1 string	2012-06-16 02:57:41 +02:00
Victor Stinner	9a77770add	Remove debug code	2012-06-16 02:44:43 +02:00
Victor Stinner	c9d369f1bf	Optimize _PyUnicode_FastCopyCharacters() when maxchar(from) > maxchar(to)	2012-06-16 02:22:37 +02:00
Victor Stinner	f05e17ece9	unicodeobject.c: Remove debug code	2012-06-16 01:53:04 +02:00
Antoine Pitrou	27f6a3b0bf	Issue #15026 : utf-16 encoding is now significantly faster (up to 10x). Patch by Serhiy Storchaka.	2012-06-15 22:15:23 +02:00
Kristján Valur Jónsson	55e5dc8371	Rearrange code to beat an optimizer bug affecting Release x64 on windows with VS2010sp1	2012-06-06 21:58:08 +00:00
Victor Stinner	d7b7c7472b	Issue #14993 : Use standard "unsigned char" instead of a unsigned char bitfield	2012-06-04 22:52:12 +02:00
Kristjan Valur Jonsson	85634d7a2e	Issue #14909 : A number of places were using PyMem_Realloc() apis and PyObject_GC_Resize() with incorrect error handling. In case of errors, the original object would be leaked. This checkin fixes those cases.	2012-05-31 09:37:31 +00:00
Victor Stinner	3a7d096f2f	Issue #14744 : Fix compilation on Windows (part 2)	2012-05-29 18:53:56 +02:00
Victor Stinner	d3f0882dfb	Issue #14744 : Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args) * Formatting string, int, float and complex use the _PyUnicodeWriter API. It avoids a temporary buffer in most cases. * Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just keep a reference to the string if the output is only composed of one string * Disable overallocation when formatting the last argument of str%args and str.format(args) * Overallocation allocates at least 100 characters: add min_length attribute to the _PyUnicodeWriter structure * Add new private functions: _PyUnicode_FastCopyCharacters(), _PyUnicode_FastFill() and _PyUnicode_FromASCII() The speed up is around 20% in average.	2012-05-29 12:57:52 +02:00
Antoine Pitrou	63065d761e	Issue #14624 : UTF-16 decoding is now 3x to 4x faster on various inputs. Patch by Serhiy Storchaka.	2012-05-15 23:48:04 +02:00
Martin v. Löwis	b05c0738d8	Silence VS 2010 signed/unsigned warnings.	2012-05-15 13:45:49 +02:00

1 2 3 4 5 ...

1071 Commits