[The HTML version of this Summary is available at http://www.python.org/dev/summary/2005-09-16_2005-09-30.html]
We have two quotes this week, one each from the two biggest threads of this fortnight: concurrency and conditional expressions. The first quote, from Donovan Barda, puts Python's approach to threading into perspective:
The reality is threads were invented as a low overhead way of easily implementing concurrent applications... ON A SINGLE PROCESSOR. Taking into account threading's limitations and objectives, Python's GIL is the best way to support threads. When hardware (seriously) moves to multiple processors, other concurrency models will start to shine.
Our second QOTF, by yours truly (hey, who could refuse a nomination from Guido?), is a not-so-subtle reminder to leave syntax decisions to Guido:
Please no more syntax proposals! ... We need to leave the syntax to Guido. We've already proved that ... we can't as a community agree on a syntax. That's what we have a BDFL for. =)
Martin v. Löwis discovered that a little more than a MiB could be saved in the size of Python installer by using LZX:21 instead of the standard MSZIP when compressing the CAB file. After confirmation from several testers that the new format worked, the change (for Python 2.4.2 and beyond) was made.
Raymond Hettinger proposed that the and and or operators be modified in Python 3.0 to produce only booleans instead of producing objects, motivating this proposal in part by the common (mis-)use of <cond> and <true-expr> or <false-expr> to emulate a conditional expression. In response, Guido suggested that that the conditional expression discussion of PEP 308 be reopened. This time around, people seemed almost unanimously in support of adding a conditional expression, though as before they disagreed on syntax. Fortunately, this time Guido cut the discussion short and pronounced a new syntax: <true-expr> if <cond> else <false-expr>. Although it has not been implemented yet, the plan is for it to appear in Python 2.5.
Once again, the subject of removing the global interpreter lock (GIL) came up. Sokolov Yura suggested that the GIL be replaced with a system where there are thread-local GILs that cooperate to share writing; Martin v. Löwis suggested that he try to implement his ideas, and predicted that he would find that doing so would be a lot of work, would require changes to all extension modules (likely to introduce new bugs, particularly race conditions), and possibly decrease performance. This kicked off several long threads about multi-processor coding.
A long time ago (circa Python 1.4), Greg Stein experimented with free threading, which did yield around a 1.6 times speedup on a dual-processor machine. To avoid the overhead of multi-processor locking on a uniprocessor machine, a separate binary could be distributed. Some of the code apparently did make it into Python 1.5, but the issue died off because no-one provided working code, or a strategy for what to do with existing extension modules.
Guido pointed out that it is not clear at this time how multiple processors will be used as they become the norm. With the threaded programming model (e.g. in Java) there are problems with concurrent modification errors (without locking) or deadlocks and livelocks (with locking). Guido's hunch (and mine, FWIW) is that instead of writing massively parallel applications, we will continue to write single-threaded applications that are tied together at the process level rather than at the thread level. He also pointed out that it's likely that most problems get little benefit out of multiple processors.
Guido threw down the gauntlet: rather than the endless discussion about this topic, someone should come up with a GIL-free Python (not necessarily CPython) and demonstrate its worth. Phillip J. Eby reminded everyone that Jython, IronPython, and PyPy exist, and that someone could, for example, create a multiprocessor-friendly backend for PyPy.
Guido also pointed out that fast threading benefits from fast context switches, which benefits from small register sets, and that the current trend in chips is towards larger register sets. In addition, multiple processors with shared memory don't scale all that well (multiple processors with explicit interprocess communication (IPC) channels scale much better). These all favour multi-processing over multi-threading. Donovan Baarda went so far as to say (a QOTF, as above), that Python's GIL is the best way to support threads, which are for single-processor use, and that when multiple-processor platforms have matured more other concurrency models will likewise mature. OTOH, Bob Ippolito pointed out that (in many operating systems) there isn't a lot of difference between threads and processes, and that threads can typically still use IPC. Bob argued that the biggest argument for threading is that lots of existing C/C++ code uses threads.
Simon Percivall argued that the problem is that Python offers ("out of the box") some support for multi-threaded programming, but little for multi-process programming beyond the basics (e.g. data sharing, communication, control over running processes, dealing out tasks to be handled). Simon suggested that the best way to stop people complaining about the GIL is to provide solid, standardized support for multi-process programming. The idea of a "multiprocess" module gained a reasonable amount of support.
Phillip J. Eby outlined an idea he is considering PEPifying, in which one could switch all context variables (such as the Decimal context and the sys.* variables) simulaneously and instantaneously when changing execution contexts (like switching between coroutines). He has a prototype implementation of the basic idea, which is less than 200 lines of Python and very fast. However, he pointed out that it's not completely PEP-ready at this point, and he needs to continue considering various parts of the concept.
Bruce Eckel joined the thread, and suggested that low-level threads people are only now catching up to objects, but as far as concurrency goes their brains still think in terms of threads, so they naturally apply thread concepts to objects. He believes that pthread-style thinking is two steps backwards: you effectively throw open the innards of the object that you just spent time decoupling from the rest of your system, and the coupling is not predictable.
Bruce and Guido had discussed offlist "active objects": defining a class as "active" would install a worker thread and concurrent queue in each object of that class, automatically turn method calls into tasks and enqueue them, and prevent any other interaction other than enqueued messages. Guido felt that if multiple active objects could co-exist in the same process, but be prevented (by the language implementation) from sharing data except via channels, and dynamic reallocation of active objects across multiple CPUs were possible, then this might be a solution. He pointed out that an implementation would really be needed to prove this.
Phillip and Martin pointed out that preventing any other interaction other than enqueued messages is the difficult part; each active object would, for example, have to have its own sys.modules. Phillip felt that such a solution (which Bruce posed as "a" solution, not "the" solution) wouldn't help with GIL removal, but would help with effective use of multiprocessor machines on platforms where fork() is available, if the API works across processes as well as threads.
Bruce then restarted the discussion, putting forth eight criteria that he felt would be necessary for the "pythonic" solution to concurrency. Items on the list were discussed further, with some disagreement about what was possible. The concurrency discussion continues next month...
Brett Cannon proposed removing support for nested function parameters so that instead of being able to write:
def f((x, y)): print x, y
you'd have to write something like:
def f(arg): x, y = arg print x, y
Brett (with help from Guido) motivated this removal (for Python 3.0) by a few factors:
In general, people were undecided on this proposal. While a number of people said they used the feature and would miss it, many of them also said that their code wouldn't suffer that much if the feature was removed. No decision had been made at the time of the summary.
In Python 2.4 some builtin iterators gained __len__ methods when the number of remaining items could be made available. This broke some of Guido's code that tested iterators for their boolean value (to distinguish them from None). Raymond Hettinger (who supplied the original patch) argued that testing for None using boolean tests was in general a bad idea, and that knowing the length of an iterator, when possible, had a number of use cases and allowed for some performance gains. However, Guido felt strongly that iterators should not supply __len__ methods, as this would lead to some people writing code expecting this method, which would then break when it received an iterator which could not determine its own length. The feature will be rolled back in Python 2.5, and Raymond will likely move the __len__ methods to private methods in order to maintain the performance gains.
Jim Fulton proposed adding a new builtin for a property-like descriptor that would only call the getter method once, so that something like:
class Spam(object): @readproperty def eggs(self): ... expensive computation of eggs self.eggs = result return result
would only do the eggs computation once. Currently, you can't do this with a property() because the self.eggs = result statement tries to call the property's fset method instead of replacing the property with the result of the eggs() call. A few other people commented that they'd needed similar functionality at times, and Guido seemed moderately interested in the idea, but there was no final resolution.
Raymond Hettinger suggested a "small, but interesting, C project" to determine whether the setobject.c implementation would be improved by recoding the set_lookkey() function to optimize key insertion order using Brent's variation of Algorithm D (c.f. Knuth vol. III, section 6.4, p525). It has the potential to boost performance for uniquification applications with duplicate keys being identified more quickly, and possibly also more frequent retirement of dummy entires during insertion operations.
Andrew Durdin pointed out that Brent's variation depends on the next probe position for a key being derivated from the key and its current position, which is incompatible with the current perturbation system; Raymond replaced perturbation with a secondary hash with linear probing. Antoine Pitrou did some experimenting with this, resulting in a -5% to 2% speedup with various benchmarks.
Raymond has also been experimenting with a simpler approach: whenever there are more than three probes, always swap the new key into the first position and then unconditionally re-insert the swapped-out key. He reported that, most of the time, this gives an improvement, and it doesn't require changing the perturbation logic. This simpler approach is cheap to implement, but the benefits are also smaller, with it improving only the worse collisions.
Nathan Bullock suggested a ''relpath(path_a, path_b)'' addition to os.path that returns a relative path from path_a to path_b. Trent Mick pointed out that there are a couple of recipes for this, as well as Jason Orendorff's Path module. Several people supported this idea, and hopefully either Nathan or one of the recipe authors will submit a patch with this functionality.
Rich Burridge followed up a comp.lang.python thread about a "vendor-packages" directory for Python by submitting a patch and asking for comments about the proposal on python-dev. General consensus was that the proposal needed a better rationale, explaining why this improved on simply adding a .pth file to the site-packages directory.
Rich explained that the rationale is that Python files supplied by the vendor (Sun, Apple, RedHat, Microsoft) with their operating system software should go in a separate base directory to differentiate them from Python files installed specifically at the site. However, Bob Ippolito pointed out that, as of OS X 10.4 ("Tiger") Apple already does this via a .pth file ("Extras.pth"), which points to ''/System/Library/Frameworks/Python.framework/Versions/2.3/Extras/lib/python'' and includes wxPython by default.
Bob also pointed out that such a "vendor-packages.pth" should look like ''import site; site.addsitedir('/usr/lib/python2.4/vendor-packages')'' so that packages like Numeric, PIL, and PyObjC, which take advantage of .pth files themselves, work when installed to the vendor-packages location.
Phillip J. Eby pointed out that it would be good to have a document for "Python Distributors" that explained these kind of things, and suggested that perhaps a volunteer or two could be found within the distutils-SIG to do this.
Guido asked if platform.system_alias() could be improved on OS X by mapping uname()'s ''Darwin x.y'' to ''OS X 10.(x-4).y''. Bob Ippolito and others pointed out that this was not a good idea, because uname() only reports on the kernel version number and not the Cocoa API, which is really what OS X 10.x.y refers to. He pointed out that the correct way to do it using a public API is to used gestalt, which is what platform.mac_ver() does.
On further inspection, it was discovered that parsing the /System/Library/CoreServices/SystemVersion.plist property list is also a supported API, and would not rely on access to the Carbon API set. Bob and Wilfredo Sánchez Vega provided sample code that would parse this plist; Marc-Andre Lemburg suggested that a patch be written for system_alias() that would use this method (if possible) for Mac OS.
This is a summary of traffic on the python-dev mailing list from September 16, 2005 through September 30, 2005. It is intended to inform the wider Python community of on-going developments on the list on a semi-monthly basis. An archive of previous summaries is available online.
An RSS feed of the titles of the summaries is available. You can also watch comp.lang.python or comp.lang.python.announce for new summaries (or through their email gateways of python-list or python-announce, respectively, as found at http://mail.python.org).
This is the 4th summary written by the python-dev summary duo of Steve Bethard and Tony Meyer (I feel like the White Rabbit in Wonderland...).
To contact us, please send email:
Do not post to comp.lang.python if you wish to reach us.
The Python Software Foundation is the non-profit organization that holds the intellectual property for Python. It also tries to advance the development and use of Python. If you find the python-dev Summary helpful please consider making a donation. You can make a donation at http://python.org/psf/donations.html . Every penny helps so even a small donation with a credit card, check, or by PayPal helps.
The in-development version of the documentation for Python can be found at http://www.python.org/dev/doc/devel/ and should be used when looking up any documentation for new code; otherwise use the current documentation as found at http://docs.python.org/ . PEPs (Python Enhancement Proposals) are located at http://www.python.org/peps/ . To view files in the Python CVS online, go to http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/ . Reported bugs and suggested patches can be found at the SourceForge project page.
Please note that this summary is written using reStructuredText. Any unfamiliar punctuation is probably markup for reST (otherwise it is probably regular expression syntax or a typo =); you can safely ignore it. We do suggest learning reST, though; it's simple and is accepted for PEP markup and can be turned into many different formats like HTML and LaTeX. Unfortunately, even though reST is standardized, the wonders of programs that like to reformat text do not allow us to guarantee you will be able to run the text version of this summary through Docutils as-is unless it is from the original text file.