merge in bad ways, so I'll have to merge that extra-carefully (probably manually.)
Merged revisions 46495-46605 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r46495 | tim.peters | 2006-05-28 03:52:38 +0200 (Sun, 28 May 2006) | 2 lines
Added missing svn:eol-style property to text files.
........
r46497 | tim.peters | 2006-05-28 12:41:29 +0200 (Sun, 28 May 2006) | 3 lines
PyErr_Display(), PyErr_WriteUnraisable(): Coverity found a cut-and-paste
bug in both: `className` was referenced before being checked for NULL.
........
r46499 | fredrik.lundh | 2006-05-28 14:06:46 +0200 (Sun, 28 May 2006) | 5 lines
needforspeed: added Py_MEMCPY macro (currently tuned for Visual C only),
and use it for string copy operations. this gives a 20% speedup on some
string benchmarks.
........
r46501 | michael.hudson | 2006-05-28 17:51:40 +0200 (Sun, 28 May 2006) | 26 lines
Quality control, meet exceptions.c.
Fix a number of problems with the need for speed code:
One is doing this sort of thing:
Py_DECREF(self->field);
self->field = newval;
Py_INCREF(self->field);
without being very sure that self->field doesn't start with a
value that has a __del__, because that almost certainly can lead
to segfaults.
As self->args is constrained to be an exact tuple we may as well
exploit this fact consistently. This leads to quite a lot of
simplification (and, hey, probably better performance).
Add some error checking in places lacking it.
Fix some rather strange indentation in the Unicode code.
Delete some trailing whitespace.
More to come, I haven't fixed all the reference leaks yet...
........
r46502 | george.yoshida | 2006-05-28 18:39:09 +0200 (Sun, 28 May 2006) | 3 lines
Patch #1080727: add "encoding" parameter to doctest.DocFileSuite
Contributed by Bjorn Tillenius.
........
r46503 | martin.v.loewis | 2006-05-28 18:57:38 +0200 (Sun, 28 May 2006) | 4 lines
Rest of patch #1490384: Commit icon source, remove
claim that Erik von Blokland is the author of the
installer picture.
........
r46504 | michael.hudson | 2006-05-28 19:40:29 +0200 (Sun, 28 May 2006) | 16 lines
Quality control, meet exceptions.c, round two.
Make some functions that should have been static static.
Fix a bunch of refleaks by fixing the definition of
MiddlingExtendsException.
Remove all the __new__ implementations apart from
BaseException_new. Rewrite most code that needs it to cope with
NULL fields (such code could get excercised anyway, the
__new__-removal just makes it more likely). This involved
editing the code for WindowsError, which I can't test.
This fixes all the refleaks in at least the start of a regrtest
-R :: run.
........
r46505 | marc-andre.lemburg | 2006-05-28 19:46:58 +0200 (Sun, 28 May 2006) | 10 lines
Initial version of systimes - a module to provide platform dependent
performance measurements.
The module is currently just a proof-of-concept implementation, but
will integrated into pybench once it is stable enough.
License: pybench license.
Author: Marc-Andre Lemburg.
........
r46507 | armin.rigo | 2006-05-28 21:13:17 +0200 (Sun, 28 May 2006) | 15 lines
("Forward-port" of r46506)
Remove various dependencies on dictionary order in the standard library
tests, and one (clearly an oversight, potentially critical) in the
standard library itself - base64.py.
Remaining open issues:
* test_extcall is an output test, messy to make robust
* tarfile.py has a potential bug here, but I'm not familiar
enough with this code. Filed in as SF bug #1496501.
* urllib2.HTTPPasswordMgr() returns a random result if there is more
than one matching root path. I'm asking python-dev for
clarification...
........
r46508 | georg.brandl | 2006-05-28 22:11:45 +0200 (Sun, 28 May 2006) | 4 lines
The empty string is a valid import path.
(fixes#1496539)
........
r46509 | georg.brandl | 2006-05-28 22:23:12 +0200 (Sun, 28 May 2006) | 3 lines
Patch #1496206: urllib2 PasswordMgr ./. default ports
........
r46510 | georg.brandl | 2006-05-28 22:57:09 +0200 (Sun, 28 May 2006) | 3 lines
Fix refleaks in UnicodeError get and set methods.
........
r46511 | michael.hudson | 2006-05-28 23:19:03 +0200 (Sun, 28 May 2006) | 3 lines
use the UnicodeError traversal and clearing functions in UnicodeError
subclasses.
........
r46512 | thomas.wouters | 2006-05-28 23:32:12 +0200 (Sun, 28 May 2006) | 4 lines
Make last patch valid C89 so Windows compilers can deal with it.
........
r46513 | georg.brandl | 2006-05-28 23:42:54 +0200 (Sun, 28 May 2006) | 3 lines
Fix ref-antileak in _struct.c which eventually lead to deallocating None.
........
r46514 | georg.brandl | 2006-05-28 23:57:35 +0200 (Sun, 28 May 2006) | 4 lines
Correct None refcount issue in Mac modules. (Are they
still used?)
........
r46515 | armin.rigo | 2006-05-29 00:07:08 +0200 (Mon, 29 May 2006) | 3 lines
A clearer error message when passing -R to regrtest.py with
release builds of Python.
........
r46516 | georg.brandl | 2006-05-29 00:14:04 +0200 (Mon, 29 May 2006) | 3 lines
Fix C function calling conventions in _sre module.
........
r46517 | georg.brandl | 2006-05-29 00:34:51 +0200 (Mon, 29 May 2006) | 3 lines
Convert audioop over to METH_VARARGS.
........
r46518 | georg.brandl | 2006-05-29 00:38:57 +0200 (Mon, 29 May 2006) | 3 lines
METH_NOARGS functions do get called with two args.
........
r46519 | georg.brandl | 2006-05-29 11:46:51 +0200 (Mon, 29 May 2006) | 4 lines
Fix refleak in socketmodule. Replace bogus Py_BuildValue calls.
Fix refleak in exceptions.
........
r46520 | nick.coghlan | 2006-05-29 14:43:05 +0200 (Mon, 29 May 2006) | 7 lines
Apply modified version of Collin Winter's patch #1478788
Renames functional extension module to _functools and adds a Python
functools module so that utility functions like update_wrapper can be
added easily.
........
r46522 | georg.brandl | 2006-05-29 15:53:16 +0200 (Mon, 29 May 2006) | 3 lines
Convert fmmodule to METH_VARARGS.
........
r46523 | georg.brandl | 2006-05-29 16:13:21 +0200 (Mon, 29 May 2006) | 3 lines
Fix#1494605.
........
r46524 | georg.brandl | 2006-05-29 16:28:05 +0200 (Mon, 29 May 2006) | 3 lines
Handle PyMem_Malloc failure in pystrtod.c. Closes#1494671.
........
r46525 | georg.brandl | 2006-05-29 16:33:55 +0200 (Mon, 29 May 2006) | 3 lines
Fix compiler warning.
........
r46526 | georg.brandl | 2006-05-29 16:39:00 +0200 (Mon, 29 May 2006) | 3 lines
Fix#1494787 (pyclbr counts whitespace as superclass name)
........
r46527 | bob.ippolito | 2006-05-29 17:47:29 +0200 (Mon, 29 May 2006) | 1 line
simplify the struct code a bit (no functional changes)
........
r46528 | armin.rigo | 2006-05-29 19:59:47 +0200 (Mon, 29 May 2006) | 2 lines
Silence a warning.
........
r46529 | georg.brandl | 2006-05-29 21:39:45 +0200 (Mon, 29 May 2006) | 3 lines
Correct some value converting strangenesses.
........
r46530 | nick.coghlan | 2006-05-29 22:27:44 +0200 (Mon, 29 May 2006) | 1 line
When adding a module like functools, it helps to let SVN know about the file.
........
r46531 | georg.brandl | 2006-05-29 22:52:54 +0200 (Mon, 29 May 2006) | 4 lines
Patches #1497027 and #972322: try HTTP digest auth first,
and watch out for handler name collisions.
........
r46532 | georg.brandl | 2006-05-29 22:57:01 +0200 (Mon, 29 May 2006) | 3 lines
Add News entry for last commit.
........
r46533 | georg.brandl | 2006-05-29 23:04:52 +0200 (Mon, 29 May 2006) | 4 lines
Make use of METH_O and METH_NOARGS where possible.
Use Py_UnpackTuple instead of PyArg_ParseTuple where possible.
........
r46534 | georg.brandl | 2006-05-29 23:58:42 +0200 (Mon, 29 May 2006) | 3 lines
Convert more modules to METH_VARARGS.
........
r46535 | georg.brandl | 2006-05-30 00:00:30 +0200 (Tue, 30 May 2006) | 3 lines
Whoops.
........
r46536 | fredrik.lundh | 2006-05-30 00:42:07 +0200 (Tue, 30 May 2006) | 4 lines
fixed "abc".count("", 100) == -96 error (hopefully, nobody's relying on
the current behaviour ;-)
........
r46537 | bob.ippolito | 2006-05-30 00:55:48 +0200 (Tue, 30 May 2006) | 1 line
struct: modulo math plus warning on all endian-explicit formats for compatibility with older struct usage (ugly)
........
r46539 | bob.ippolito | 2006-05-30 02:26:01 +0200 (Tue, 30 May 2006) | 1 line
Add a length check to aifc to ensure it doesn't write a bogus file
........
r46540 | tim.peters | 2006-05-30 04:25:25 +0200 (Tue, 30 May 2006) | 10 lines
deprecated_err(): Stop bizarre warning messages when the tests
are run in the order:
test_genexps (or any other doctest-based test)
test_struct
test_doctest
The `warnings` module needs an advertised way to save/restore
its internal filter list.
........
r46541 | tim.peters | 2006-05-30 04:26:46 +0200 (Tue, 30 May 2006) | 2 lines
Whitespace normalization.
........
r46542 | tim.peters | 2006-05-30 04:30:30 +0200 (Tue, 30 May 2006) | 2 lines
Set a binary svn:mime-type property on this UTF-8 encoded file.
........
r46543 | neal.norwitz | 2006-05-30 05:18:50 +0200 (Tue, 30 May 2006) | 1 line
Simplify further by using AddStringConstant
........
r46544 | tim.peters | 2006-05-30 06:16:25 +0200 (Tue, 30 May 2006) | 6 lines
Convert relevant dict internals to Py_ssize_t.
I don't have a box with nearly enough RAM, or an OS,
that could get close to tickling this, though (requires
a dict w/ at least 2**31 entries).
........
r46545 | neal.norwitz | 2006-05-30 06:19:21 +0200 (Tue, 30 May 2006) | 1 line
Remove stray | in comment
........
r46546 | neal.norwitz | 2006-05-30 06:25:05 +0200 (Tue, 30 May 2006) | 1 line
Use Py_SAFE_DOWNCAST for safety. Fix format strings. Remove 2 more stray | in comment
........
r46547 | neal.norwitz | 2006-05-30 06:43:23 +0200 (Tue, 30 May 2006) | 1 line
No DOWNCAST is required since sizeof(Py_ssize_t) >= sizeof(int) and Py_ReprEntr returns an int
........
r46548 | tim.peters | 2006-05-30 07:04:59 +0200 (Tue, 30 May 2006) | 3 lines
dict_print(): Explicitly narrow the return value
from a (possibly) wider variable.
........
r46549 | tim.peters | 2006-05-30 07:23:59 +0200 (Tue, 30 May 2006) | 5 lines
dict_print(): So that Neal & I don't spend the rest of
our lives taking turns rewriting code that works ;-),
get rid of casting illusions by declaring a new variable
with the obvious type.
........
r46550 | georg.brandl | 2006-05-30 09:04:55 +0200 (Tue, 30 May 2006) | 3 lines
Restore exception pickle support. #1497319.
........
r46551 | georg.brandl | 2006-05-30 09:13:29 +0200 (Tue, 30 May 2006) | 3 lines
Add a test case for exception pickling. args is never NULL.
........
r46552 | neal.norwitz | 2006-05-30 09:21:10 +0200 (Tue, 30 May 2006) | 1 line
Don't fail if the (sub)pkgname already exist.
........
r46553 | georg.brandl | 2006-05-30 09:34:45 +0200 (Tue, 30 May 2006) | 3 lines
Disallow keyword args for exceptions.
........
r46554 | neal.norwitz | 2006-05-30 09:36:54 +0200 (Tue, 30 May 2006) | 5 lines
I'm impatient. I think this will fix a few more problems with the buildbots.
I'm not sure this is the best approach, but I can't think of anything better.
If this creates problems, feel free to revert, but I think it's safe and
should make things a little better.
........
r46555 | georg.brandl | 2006-05-30 10:17:00 +0200 (Tue, 30 May 2006) | 4 lines
Do the check for no keyword arguments in __init__ so that
subclasses of Exception can be supplied keyword args
........
r46556 | georg.brandl | 2006-05-30 10:47:19 +0200 (Tue, 30 May 2006) | 3 lines
Convert test_exceptions to unittest.
........
r46557 | andrew.kuchling | 2006-05-30 14:52:01 +0200 (Tue, 30 May 2006) | 1 line
Add SoC name, and reorganize this section a bit
........
r46559 | tim.peters | 2006-05-30 17:53:34 +0200 (Tue, 30 May 2006) | 11 lines
PyLong_FromString(): Continued fraction analysis (explained in
a new comment) suggests there are almost certainly large input
integers in all non-binary input bases for which one Python digit
too few is initally allocated to hold the final result. Instead
of assert-failing when that happens, allocate more space. Alas,
I estimate it would take a few days to find a specific such case,
so this isn't backed up by a new test (not to mention that such
a case may take hours to run, since conversion time is quadratic
in the number of digits, and preliminary attempts suggested that
the smallest such inputs contain at least a million digits).
........
r46560 | fredrik.lundh | 2006-05-30 19:11:48 +0200 (Tue, 30 May 2006) | 3 lines
changed find/rfind to return -1 for matches outside the source string
........
r46561 | bob.ippolito | 2006-05-30 19:37:54 +0200 (Tue, 30 May 2006) | 1 line
Change wrapping terminology to overflow masking
........
r46562 | fredrik.lundh | 2006-05-30 19:39:58 +0200 (Tue, 30 May 2006) | 3 lines
changed count to return 0 for slices outside the source string
........
r46568 | tim.peters | 2006-05-31 01:28:02 +0200 (Wed, 31 May 2006) | 2 lines
Whitespace normalization.
........
r46569 | brett.cannon | 2006-05-31 04:19:54 +0200 (Wed, 31 May 2006) | 5 lines
Clarify wording on default values for strptime(); defaults are used when better
values cannot be inferred.
Closes bug #1496315.
........
r46572 | neal.norwitz | 2006-05-31 09:43:27 +0200 (Wed, 31 May 2006) | 1 line
Calculate smallest properly (it was off by one) and use proper ssize_t types for Win64
........
r46573 | neal.norwitz | 2006-05-31 10:01:08 +0200 (Wed, 31 May 2006) | 1 line
Revert last checkin, it is better to do make distclean
........
r46574 | neal.norwitz | 2006-05-31 11:02:44 +0200 (Wed, 31 May 2006) | 3 lines
On 64-bit platforms running test_struct after test_tarfile would fail
since the deprecation warning wouldn't be raised.
........
r46575 | thomas.heller | 2006-05-31 13:37:58 +0200 (Wed, 31 May 2006) | 3 lines
PyTuple_Pack is not available in Python 2.3, but ctypes must stay
compatible with that.
........
r46576 | andrew.kuchling | 2006-05-31 15:18:56 +0200 (Wed, 31 May 2006) | 1 line
'functional' module was renamed to 'functools'
........
r46577 | kristjan.jonsson | 2006-05-31 15:35:41 +0200 (Wed, 31 May 2006) | 1 line
Fixup the PCBuild8 project directory. exceptions.c have moved to Objects, and the functionalmodule.c has been replaced with _functoolsmodule.c. Other minor changes to .vcproj files and .sln to fix compilation
........
r46578 | andrew.kuchling | 2006-05-31 16:08:48 +0200 (Wed, 31 May 2006) | 15 lines
[Bug #1473048]
SimpleXMLRPCServer and DocXMLRPCServer don't look at
the path of the HTTP request at all; you can POST or
GET from / or /RPC2 or /blahblahblah with the same results.
Security scanners that look for /cgi-bin/phf will therefore report
lots of vulnerabilities.
Fix: add a .rpc_paths attribute to the SimpleXMLRPCServer class,
and report a 404 error if the path isn't on the allowed list.
Possibly-controversial aspect of this change: the default makes only
'/' and '/RPC2' legal. Maybe this will break people's applications
(though I doubt it). We could just set the default to an empty tuple,
which would exactly match the current behaviour.
........
r46579 | andrew.kuchling | 2006-05-31 16:12:47 +0200 (Wed, 31 May 2006) | 1 line
Mention SimpleXMLRPCServer change
........
r46580 | tim.peters | 2006-05-31 16:28:07 +0200 (Wed, 31 May 2006) | 2 lines
Trimmed trailing whitespace.
........
r46581 | tim.peters | 2006-05-31 17:33:22 +0200 (Wed, 31 May 2006) | 4 lines
_range_error(): Speed and simplify (there's no real need for
loops here). Assert that size_t is actually big enough, and
that f->size is at least one. Wrap a long line.
........
r46582 | tim.peters | 2006-05-31 17:34:37 +0200 (Wed, 31 May 2006) | 2 lines
Repaired error in new comment.
........
r46584 | neal.norwitz | 2006-06-01 07:32:49 +0200 (Thu, 01 Jun 2006) | 4 lines
Remove ; at end of macro. There was a compiler recently that warned
about extra semi-colons. It may have been the HP C compiler.
This file will trigger a bunch of those warnings now.
........
r46585 | georg.brandl | 2006-06-01 08:39:19 +0200 (Thu, 01 Jun 2006) | 3 lines
Correctly unpickle 2.4 exceptions via __setstate__ (patch #1498571)
........
r46586 | georg.brandl | 2006-06-01 10:27:32 +0200 (Thu, 01 Jun 2006) | 3 lines
Correctly allocate complex types with tp_alloc. (bug #1498638)
........
r46587 | georg.brandl | 2006-06-01 14:30:46 +0200 (Thu, 01 Jun 2006) | 2 lines
Correctly dispatch Faults in loads (patch #1498627)
........
r46588 | georg.brandl | 2006-06-01 15:00:49 +0200 (Thu, 01 Jun 2006) | 3 lines
Some code style tweaks, and remove apply.
........
r46589 | armin.rigo | 2006-06-01 15:19:12 +0200 (Thu, 01 Jun 2006) | 5 lines
[ 1497053 ] Let dicts propagate the exceptions in user __eq__().
[ 1456209 ] dictresize() vulnerability ( <- backport candidate ).
........
r46590 | tim.peters | 2006-06-01 15:41:46 +0200 (Thu, 01 Jun 2006) | 2 lines
Whitespace normalization.
........
r46591 | tim.peters | 2006-06-01 15:49:23 +0200 (Thu, 01 Jun 2006) | 2 lines
Record bugs 1275608 and 1456209 as being fixed.
........
r46592 | tim.peters | 2006-06-01 15:56:26 +0200 (Thu, 01 Jun 2006) | 5 lines
Re-enable a new empty-string test added during the NFS sprint,
but disabled then because str and unicode strings gave different
results. The implementations were repaired later during the
sprint, but the new test remained disabled.
........
r46594 | tim.peters | 2006-06-01 17:50:44 +0200 (Thu, 01 Jun 2006) | 7 lines
Armin committed his patch while I was reviewing it (I'm sure
he didn't know this), so merged in some changes I made during
review. Nothing material apart from changing a new `mask` local
from int to Py_ssize_t. Mostly this is repairing comments that
were made incorrect, and adding new comments. Also a few
minor code rewrites for clarity or helpful succinctness.
........
r46599 | neal.norwitz | 2006-06-02 06:45:53 +0200 (Fri, 02 Jun 2006) | 1 line
Convert docstrings to comments so regrtest -v prints method names
........
r46600 | neal.norwitz | 2006-06-02 06:50:49 +0200 (Fri, 02 Jun 2006) | 2 lines
Fix memory leak found by valgrind.
........
r46601 | neal.norwitz | 2006-06-02 06:54:52 +0200 (Fri, 02 Jun 2006) | 1 line
More memory leaks from valgrind
........
r46602 | neal.norwitz | 2006-06-02 08:23:00 +0200 (Fri, 02 Jun 2006) | 11 lines
Patch #1357836:
Prevent an invalid memory read from test_coding in case the done flag is set.
In that case, the loop isn't entered. I wonder if rather than setting
the done flag in the cases before the loop, if they should just exit early.
This code looks like it should be refactored.
Backport candidate (also the early break above if decoding_fgets fails)
........
r46603 | martin.blais | 2006-06-02 15:03:43 +0200 (Fri, 02 Jun 2006) | 1 line
Fixed struct test to not use unittest.
........
r46605 | tim.peters | 2006-06-03 01:22:51 +0200 (Sat, 03 Jun 2006) | 10 lines
pprint functions used to sort a dict (by key) if and only if
the output required more than one line. "Small" dicts got
displayed in seemingly random order (the hash-induced order
produced by dict.__repr__). None of this was documented.
Now pprint functions always sort dicts by key, and the docs
promise it.
This was proposed and agreed to during the PyCon 2006 core
sprint -- I just didn't have time for it before now.
........
number of tests, all because of the codecs/_multibytecodecs issue described
here (it's not a Py3K issue, just something Py3K discovers):
http://mail.python.org/pipermail/python-dev/2006-April/064051.html
Hye-Shik Chang promised to look for a fix, so no need to fix it here. The
tests that are expected to break are:
test_codecencodings_cn
test_codecencodings_hk
test_codecencodings_jp
test_codecencodings_kr
test_codecencodings_tw
test_codecs
test_multibytecodec
This merge fixes an actual test failure (test_weakref) in this branch,
though, so I believe merging is the right thing to do anyway.
- The copy module now "copies" function objects (as atomic objects).
- dict.__getitem__ now looks for a __missing__ hook before raising
KeyError.
- Added a new type, defaultdict, to the collections module.
This uses the new __missing__ hook behavior added to dict (see above).
This gives another 30% speedup for operations such as
map(func, d.iteritems()) or list(d.iteritems()) which can both take
advantage of length information when provided.
* Split into three separate types that share everything except the
code for iternext. Saves run time decision making and allows
each iternext function to be specialized.
* Inlined PyDict_Next(). In addition to saving a function call, this
allows a redundant test to be eliminated and further specialization
of the code for the unique needs of each iterator type.
* Created a reusable result tuple for iteritems(). Saves the malloc
time for tuples when the previous result was not kept by client code
(this is the typical use case for iteritems). If the client code
does keep the reference, then a new tuple is created.
Results in a 20% to 30% speedup depending on the size and sparsity
of the dictionary.
* Factored constant structure references out of the inner loops for
PyDict_Next(), dict_keys(), dict_values(), and dict_items().
Gave measurable speedups to each (the improvement varies depending
on the sparseness of the dictionary being measured).
* Added a freelist scheme styled after that for tuples. Saves around
80% of the calls to malloc and free. About 10% of the time, the
previous dictionary was completely empty; in those cases, the
dictionary initialization with memset() can be skipped.
(Championed by Bob Ippolito.)
The update() method for mappings now accepts all the same argument forms
as the dict() constructor. This includes item lists and/or keyword
arguments.
* Increase dictionary growth rate resulting in more sparse dictionaries,
fewer lookup collisions, increased memory use, and better cache
performance. For dicts with over 50k entries, keep the current
growth rate in case an application is suffering from tight memory
constraints.
* Set the most common case (no resize) to fall-through the test.
the optional proto 2 slot state.
pickle.py, load_build(): CAUTION: Noted that cPickle's
load_build and pickle's load_build really don't do the same
things with the state, and didn't before this patch either.
cPickle never tries to do .update(), and has no backoff if
instance.__dict__ can't be retrieved. There are no tests
that can tell the difference, and part of what cPickle's
load_build() did looked accidental to me, so I don't know
what the true intent is here.
pickletester.py, test_pickle.py: Got rid of the hack for
exempting cPickle from running some of the proto 2 tests.
dictobject.c, PyDict_Next(): documented intended use.
Obtain cleaner coding and a system wide
performance boost by using the fast, pre-parsed
PyArg_Unpack function instead of PyArg_ParseTuple
function which is driven by a format string.
Most of these patches are from Thomas Heller, with long lines folded
by Tim. The change to test_descr.py is from Guido. See the bug report.
Not a bugfix candidate -- METH_CLASS is new in 2.3.
Just van Rossum showed a weird, but clever way for pure python code to
trigger the BadInternalCall. The C code had assumed that calling a class
constructor would return an instance of that class; however, classes that
abuse __new__ can invalidate that assumption.
interning. I modified Oren's patch significantly, but the basic idea
and most of the implementation is unchanged. Interned strings created
with PyString_InternInPlace() are now mortal, and you must keep a
reference to the resulting string around; use the new function
PyString_InternImmortal() to create immortal interned strings.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure. Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers. (In
fact, we expect that the compilers are all fixed eight years later.)
I'm leaving staticforward and statichere defined in object.h as
static. This is only for backwards compatibility with C extensions
that might still use it.
XXX I haven't updated the documentation.
di_dict field when the end of the list is reached. Also make the
error ("dictionary changed size during iteration") a sticky state.
Also remove the next() method -- one is supplied automatically by
PyType_Ready() because the tp_iternext slot is set. That's a good
thing, because the implementation given here was buggy (it never
raised StopIteration).
PEP 285. Everything described in the PEP is here, and there is even
some documentation. I had to fix 12 unit tests; all but one of these
were printing Boolean outcomes that changed from 0/1 to False/True.
(The exception is test_unicode.py, which did a type(x) == type(y)
style comparison. I could've fixed that with a single line using
issubtype(x, type(y)), but instead chose to be explicit about those
places where a bool is expected.
Still to do: perhaps more documentation; change standard library
modules to return False/True from predicates.
The fix makes it possible to call PyObject_GC_UnTrack() more than once
on the same object, and then move the PyObject_GC_UnTrack() call to
*before* the trashcan code is invoked.
BUGFIX CANDIDATE!
PyDict_UpdateFromSeq2(): removed it.
PyDict_MergeFromSeq2(): made it public and documented it.
PyDict_Merge() docs: updated to reveal <wink> that the second
argument can be any mapping object.
Rather than tweaking the inheritance of type object slots (which turns
out to be too messy to try), this fix adds a __hash__ to the list and
dict types (the only mutable types I'm aware of) that explicitly
raises an error. This has the advantage that list.__hash__([]) also
raises an error (previously, this would invoke object.__hash__([]),
returning the argument's address); ditto for dict.__hash__.
The disadvantage for this fix is that 3rd party mutable types aren't
automatically fixed. This should be added to the rules for creating
subclassable extension types: if you don't want your object to be
hashable, add a tp_hash function that raises an exception.
Also, it's possible that I've forgotten about other mutable types for
which this should be done.
outer level, the iterator protocol is used for memory-efficiency (the
outer sequence may be very large if fully materialized); at the inner
level, PySequence_Fast() is used for time-efficiency (these should
always be sequences of length 2).
dictobject.c, new functions PyDict_{Merge,Update}FromSeq2. These are
wholly analogous to PyDict_{Merge,Update}, but process a sequence-of-2-
sequences argument instead of a mapping object. For now, I left these
functions file static, so no corresponding doc changes. It's tempting
to change dict.update() to allow a sequence-of-2-seqs argument too.
Also changed the name of dictionary's keyword argument from "mapping"
to "x". Got a better name? "mapping_or_sequence_of_pairs" isn't
attractive, although more so than "mosop" <wink>.
abstract.h, abstract.tex: Added new PySequence_Fast_GET_SIZE function,
much faster than going thru the all-purpose PySequence_Size.
libfuncs.tex:
- Document dictionary().
- Fiddle tuple() and list() to admit that their argument is optional.
- The long-winded repetitions of "a sequence, a container that supports
iteration, or an iterator object" is getting to be a PITA. Many
months ago I suggested factoring this out into "iterable object",
where the definition of that could include being explicit about
generators too (as is, I'm not sure a reader outside of PythonLabs
could guess that "an iterator object" includes a generator call).
- Please check my curly braces -- I'm going blind <0.9 wink>.
abstract.c, PySequence_Tuple(): When PyObject_GetIter() fails, leave
its error msg alone now (the msg it produces has improved since
PySequence_Tuple was generalized to accept iterable objects, and
PySequence_Tuple was also stomping on the msg in cases it shouldn't
have even before PyObject_GetIter grew a better msg).
many types were subclassable but had a xxx_dealloc function that
called PyObject_DEL(self) directly instead of deferring to
self->ob_type->tp_free(self). It is permissible to set tp_free in the
type object directly to _PyObject_Del, for non-GC types, or to
_PyObject_GC_Del, for GC types. Still, PyObject_DEL was a tad faster,
so I'm fearing that our pystone rating is going down again. I'm not
sure if doing something like
void xxx_dealloc(PyObject *self)
{
if (PyXxxCheckExact(self))
PyObject_DEL(self);
else
self->ob_type->tp_free(self);
}
is any faster than always calling the else branch, so I haven't
attempted that -- however those types whose own dealloc is fancier
(int, float, unicode) do use this pattern.
keys are true strings -- no subclasses need apply. This may be debatable.
The problem is that a str subclass may very well want to override __eq__
and/or __hash__ (see the new example of case-insensitive strings in
test_descr), but go-fast shortcuts for strings are ubiquitous in our dicts
(and subclass overrides aren't even looked for then). Another go-fast
reason for the change is that PyCheck_StringExact() is a quicker test
than PyCheck_String(), and we make such a test on virtually every access
to every dict.
OTOH, a str subclass may also be perfectly happy using the base str eq
and hash, and this change slows them a lot. But those cases are still
hypothetical, while Python's own reliance on true-string dicts is not.
mapping object", in the same sense dict.update(x) requires of x (that x
has a keys() method and a getitem).
Questionable: The other type constructors accept a keyword argument, so I
did that here too (e.g., dictionary(mapping={1:2}) works). But type_call
doesn't pass the keyword args to the tp_new slot (it passes NULL), it only
passes them to the tp_init slot, so getting at them required adding a
tp_init slot to dicts. Looks like that makes the normal case (i.e., no
args at all) a little slower (the time it takes to call dict.tp_init and
have it figure out there's nothing to do).
"mapping" object, specifically one that supports PyMapping_Keys() and
PyObject_GetItem(). This allows you to say e.g. {}.update(UserDict())
We keep the special case for concrete dict objects, although that
seems moderately questionable. OTOH, the code exists and works, so
why change that?
.update()'s docstring already claims that D.update(E) implies calling
E.keys() so it's appropriate not to transform AttributeErrors in
PyMapping_Keys() to TypeErrors.
Patch eyeballed by Tim.
Gave Python linear-time repr() implementations for dicts, lists, strings.
This means, e.g., that repr(range(50000)) is no longer 50x slower than
pprint.pprint() in 2.2 <wink>.
I don't consider this a bugfix candidate, as it's a performance boost.
Added _PyString_Join() to the internal string API. If we want that in the
public API, fine, but then it requires runtime error checks instead of
asserts.
frequently used, and in particular this allows to drop the last
remaining obvious time-waster in the crucial lookdict() and
lookdict_string() functions. Other changes consist mostly of changing
"i < ma_size" to "i <= ma_mask" everywhere.
be possible to provoke unbounded recursion now, but leaving that to someone
else to provoke and repair.
Bugfix candidate -- although this is getting harder to backstitch, and the
cases it's protecting against are mondo contrived.
code, less memory. Tests have uncovered no drawbacks. Christian and
Vladimir are the other two people who have burned many brain cells on the
dict code in recent years, and they like the approach too, so I'm checking
it in without further ado.
instead of multiplication to generate the probe sequence. The idea is
recorded in Python-Dev for Dec 2000, but that version is prone to rare
infinite loops.
The value is in getting *all* the bits of the hash code to participate;
and, e.g., this speeds up querying every key in a dict with keys
[i << 16 for i in range(20000)] by a factor of 500. Should be equally
valuable in any bad case where the high-order hash bits were getting
ignored.
Also wrote up some of the motivations behind Python's ever-more-subtle
hash table strategy.
dictresize() was too aggressive about never ever resizing small dicts.
If a small dict is entirely full, it needs to rebuild it despite that
it won't actually resize it, in order to purge old dummy entries thus
creating at least one virgin slot (lookdict assumes at least one such
exists).
Also took the opportunity to add some high-level comments to dictresize.
The idea is Marc-Andre Lemburg's, the implementation is Tim's.
Add a new ma_smalltable member to dictobjects, an embedded vector of
MINSIZE (8) dictentry structs. Short course is that this lets us avoid
additional malloc(s) for dicts with no more than 5 entries.
The changes are widespread but mostly small.
Long course: WRT speed, all scalar operations (getitem, setitem, delitem)
on non-empty dicts benefit from no longer needing NULL-pointer checks
(ma_table is never NULL anymore). Bulk operations (copy, update, resize,
clearing slots during dealloc) benefit in some cases from now looping
on the ma_fill count rather than on ma_size, but that was an unexpected
benefit: the original reason to loop on ma_fill was to let bulk
operations on empty dicts end quickly (since the NULL-pointer checks
went away, empty dicts aren't special-cased any more).
Special considerations:
For dicts that remain empty, this change is a lose on two counts:
the dict object contains 8 new dictentry slots now that weren't
needed before, and dict object creation also spends time memset'ing
these doomed-to-be-unsused slots to NULLs.
For dicts with one or two entries that never get larger than 2, it's
a mix: a malloc()/free() pair is no longer needed, and the 2-entry case
gets to use 8 slots (instead of 4) thus decreasing the chance of
collision. Against that, dict object creation spends time memset'ing
4 slots that aren't strictly needed in this case.
For dicts with 3 through 5 entries that never get larger than 5, it's a
pure win: the dict is created with all the space they need, and they
never need to resize. Before they suffered two malloc()/free() calls,
plus 1 dict resize, to get enough space. In addition, the 8-slot
table they ended with consumed more memory overall, because of the
hidden overhead due to the additional malloc.
For dicts with 6 or more entries, the ma_smalltable member is wasted
space, but then these are large(r) dicts so 8 slots more or less doesn't
make much difference. They still benefit all the time from removing
ubiquitous dynamic null-pointer checks, and get a small benefit (but
relatively smaller the larger the dict) from not having to do two
mallocs, two frees, and a resize on the way *to* getting their sixth
entry.
All in all it appears a small but definite general win, with larger
benefits in specific cases. It's especially nice that it allowed to
get rid of several branches, gotos and labels, and overall made the
code smaller.
Two exceedingly unlikely errors in dictresize():
1. The loop for finding the new size had an off-by-one error at the
end (could over-index the polys[] vector).
2. The polys[] vector ended with a 0, apparently intended as a sentinel
value but never used as such; i.e., it was never checked, so 0 could
have been used *as* a polynomial.
Neither bug could trigger unless a dict grew to 2**30 slots; since that
would consume at least 12GB of memory just to hold the dict pointers,
I'm betting it's not the cause of the bug Fred's tracking down <wink>.
in the comments for using two passes was bogus, as the only object that
can get decref'ed due to the copy is the dummy key, and decref'ing dummy
can't have side effects (for one thing, dummy is immortal! for another,
it's a string object, not a potentially dangerous user-defined object).
to reason that me_key is much more likely to match the key we're looking
for than to match dummy, and if the key is absent me_key is much more
likely to be NULL than dummy: most dicts don't even have a dummy entry.
Running instrumented dict code over the test suite and some apps confirmed
that matching dummy was 200-300x less frequent than matching key in
practice. So this reorders the tests to try the common case first.
It can lose if a large dict with many collisions is mostly deleted, not
resized, and then frequently searched, but that's hardly a case we
should be favoring.
The comment following used to say:
/* We use ~hash instead of hash, as degenerate hash functions, such
as for ints <sigh>, can have lots of leading zeros. It's not
really a performance risk, but better safe than sorry.
12-Dec-00 tim: so ~hash produces lots of leading ones instead --
what's the gain? */
That is, there was never a good reason for doing it. And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.
Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.
Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items(). A number
of std tests suffered bogus failures as a result. For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,
>>> d = {}
>>> for i in range(10):
... d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>
Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.
test_support.py
Moved test_extcall's sortdict() into test_support, made it stronger,
and imported sortdict into other std tests that needed it.
test_unicode.py
Excluced cp875 from the "roundtrip over range(128)" test, because
cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
See Python-Dev for excruciating details.
Cookie.py
Chaged various output functions to sort dicts before building
strings from them.
test_extcall
Fiddled the expected-result file. This remains sensitive to native
dict ordering, because, e.g., if there are multiple errors in a
keyword-arg dict (and test_extcall sets up many cases like that), the
specific error Python complains about first depends on native dict
ordering.
doesn't know how to do LE, LT, GE, GT. dict_richcompare can't do the
latter any faster than dict_compare can. More importantly, for
cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and
GT one at a time, even if the tp_compare slot is defined, and
dict_richcompare called dict_compare for the latter two because
it couldn't do them itself. The result was a lot of wasted calls to
dict_compare. Now dict_richcompare gives up at once the times Python
calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare
is called only once (when Python gets around to trying the tp_compare
slot).
Continued mystery: despite that this cut the number of calls to
dict_compare approximately in half in test_mutants.py, the latter still
runs amazingly slowly. Running under the debugger doesn't show excessive
activity in the dict comparison code anymore, so I'm guessing the culprit
is somewhere else -- but where? Perhaps in the element (key/value)
comparison code? We clearly spend a lot of time figuring out how to
compare things.
Fixed a half dozen ways in which general dict comparison could crash
Python (even cause Win98SE to reboot) in the presence of kay and/or
value comparison routines that mutate the dict during dict comparison.
Bugfix candidate.
d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2
don't support comparisons other than ==, and testing dicts for equality
is faster now (especially when inequality obtains).
dictionary size was comparing ma_size, the hash table size, which is
always a power of two, rather than ma_used, wich changes on each
insertion or deletion. Fixed this.
sees it (test_iter.py is unchanged).
- Added a tp_iternext slot, which calls the iterator's next() method;
this is much faster for built-in iterators over built-in types
such as lists and dicts, speeding up pybench's ForLoop with about
25% compared to Python 2.1. (Now there's a good argument for
iterators. ;-)
- Renamed the built-in sequence iterator SeqIter, affecting the C API
functions for it. (This frees up the PyIter prefix for generic
iterator operations.)
- Added PyIter_Check(obj), which checks that obj's type has a
tp_iternext slot and that the proper feature flag is set.
- Added PyIter_Next(obj) which calls the tp_iternext slot. It has a
somewhat complex return condition due to the need for speed: when it
returns NULL, it may not have set an exception condition, meaning
the iterator is exhausted; when the exception StopIteration is set
(or a derived exception class), it means the same thing; any other
exception means some other error occurred.
new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines
TODO:
documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)