python-dev Summary for 2005-10-01 through 2005-10-15

Contents

[The HTML version of this Summary is available at http://www.python.org/dev/summary/2005-10-01_2005-10-15.html]

Announcements

QOTF: Quote of the Fortnight

From Phillip J. Eby:

So, if threads are "easy" in Python compared to other langauges, it's because of the GIL, not in spite of it.

Contributing thread:

Pythonic concurrency

[SJB]

GCC/G++ Issues on Linux: Patch available

Christoph Ludwig provided the previously promised patch to address some of the issues in compiling Python with GCC/G++ on Linux. The patch keeps ELF systems like x86 / Linux from having any dependencies on the C++ runtime, and allows systems that require main() to be a C++ function to be configured appropriately.

Contributing thread:

[C++-sig] GCC version compatibility

[SJB]

Summaries

Concurrency in Python

Michael Sparks spent a bit of time descibing the current state and future goals of the Kamaelia project. Mainly, Kamaelia aims to make concurrency as simple and easy to use as possible. A scheduler manages a set of generators that communicate with each other through Queues. The long term goals include being able to farm the various generators off into thread or processes as needed, so that whether your concurrency model is cooperative, threaded or process-based, your code can basically look the same.

There was also continued discussion about how "easy" threads are. Shane Hathaway made the point that it's actually locking that's "insanely difficult", and approaches that simplify how much you need to think about locking can keep threading relatively easy -- this was one of the strong points of ZODB. A fairly large camp also got behind the claim that threads are easy if you're limited to only message passing. There were also a few comments about how Python makes threading easier, e.g. through the GIL (see QOTF: Quote of the Fortnight) and through threading.threads's encapsulation of thread-local resources as instance attributes.

Contributing threads:

[SJB]

Organization of modules for threading

A few people took issue with the current organization of the threading modules into Queue, thread and threading. Guido views Queue as an application of threading, so putting it in the threading module is inappropriate (though with a deeper package structure, it should definitely be a sibling). Nick Coghlan suggested that Queue should be in a threadtools module (in parallel with itertools), while Skip proposed a hierarchy of modules with thread and lock being in the lowest level one, and Thread and Queue being in the highest level. Aahz suggested (and Guido approved) deprecating the thread module and renaming it to _thread at least in Python 3.0. It seems the deprecation may happen sooner though.

Contributing threads:

[SJB]

Speed of Unicode decoding

Tony Nelson found that decoding with a codec like mac-roman or iso8859-1 can take around ten times as long as decoding with utf-8. Walter Dörwald provided a patch that implements the mapping using a unicode string of length 256 where undefined characters are mapped to u"ufffd". This dropped the decode time for mac-roman to nearly the speed of the utf-8 decoding. Hye-Shik Chang showed off a fastmap decoder with comparable performance. In the end, Walter's patch was accepted.

Contributing thread:

Unicode charmap decoders slow

[SJB]

Updates to PEP 343

Jason Orendorff proposed replacing the __enter__() and __exit__() methods on context managers with a simple __with__() method instead. While Guido was unconvinced that __enter__() and __exit__() should be dropped, he was convinced that context managers should have a __with__() method in parallel with the __iter__() method for iterators. There was some talk of special-casing the @contextmanager decorator on the __with__() method, but no conclusion.

Contributing threads:

[SJB]

str and unicode issues

Martin Blais wanted to completely disable the implicit conversions between unicode and str, so that you would always be forced to call either .encode() or .decode() to convert between one and the other. This is already available through adding sys.setdefaultencoding('undefined') to your sitecustomize.py file, but the suggestion started another long discussion over unicode issues. Antoine Pitrou suggested that a good rule of thumb is to convert to unicode everything that is semantically textual, and to only use str for what is to be semantically treated as a string of bytes. Fredrik Lundh argued against this for efficiency reasons -- pure ASCII text would consume more space as a unicode object.

There were suggestions that in Python 3.0, opening files in text mode will require an encoding and produce string objects, while opening files in binary mode will produce bytes objects. The bytes() type will be a mutable array of bytes, which can be converted to a string object by specifying an encoding.

Contributing threads:

[SJB]

Allowing *args syntax in tuple unpacking and before keyword arguments

Gustavo Niemeyer propsed the oft-seen request for allowing the *args syntax in tuple unpacking, e.g.:

for first, second, *rest in iterator:

Guido requested a PEP, saying that he wasn't convinced that there was much of a gain over the already valid:

for item in iterator:
    (first, second), rest = item[2:], item[:2]

Greg Ewing and others didn't like Guido's suggestion as it violates DRY (Don't Repeat Yourself). Others also chimed in with some examples in support of the proposal, but no one has yet put together a PEP.

In a related matter, Guido indicated that he wants to be able to write keyword-only arguments after a *args, so that you could, for example, write:

f(a, b, *args, foo=1, bar=2, **kwds)

People seemed almost unanimously in support of this proposal, but, to quote Nick Coghlan, it has still "never bugged anyone enough for them to actaully get around to fixing it".

Contributing thread:

Extending tuple unpacking

[SJB]

AST Branch

Guido gave the AST branch a three week ultimatum: either the branch should be merged into MAIN within the next three weeks, or the branch should be abandoned entirely. This jump-started work on the branch, and the team was hoping to merge the changes the weekend of October 15th.

Contributing threads:

[SJB]

Allowing "return obj" in generators

Piet Delport suggested having return obj in generators be translated into raise StopIteration(obj). The return value of a generator function would thus be available as the first arg in the StopIteration exception. Guido asked for some examples to give the idea a better motivation, and felt uncomfortable with the return value being silently ignored in for-loops. The idea was postponed until at least one release after a PEP 342 implementation enters Python, so that people can have some more experience with coroutines.

Contributing threads:

[SJB]

API for the line-number table

Greg Ewing suggested trying to simplify the line-number table (lnotab) by simply matching each byte-code index with a file and line number. Phillip J. Eby pointed out that this would make the stdlib take up an extra megabyte, suggesting two tables instead, one matching bytecodes to line numbers, and one matching the first line-number of a chunk with its file. Michael Hudson suggested that what we really want is an API for accessing the lnotab, so that the implementation that is chosen is less important. The conversation trailed off without a resolution.

Contributing thread:

Simplify lnotab? (AST branch update)

[SJB]

Current directory and sys.path

A question about the status of the CurrentVersion registry entry led to a discussion about the different behaviors of sys.path across platforms. Apparently, on Windows, sys.path includes the current directory and the directory of the script being executed, while on Linux, it only includes the directory of the script.

Contributing thread:

PythonCoreCurrentVersion

[SJB]

Changing the class of builtins

As of Python 2.3, you can no longer change the __class__ of any builtin. Phillip J. Eby suggested that these rules might be overly strict; modules and other mutable objects could probably reasonably have their __class__s changed. No one seemed really opposed to the idea, but no one offered up a patch to make the change either.

Contributing thread:

Assignment to __class__ of module? (Autoloading? (Making Queue.Queue easier to use))

[SJB]

exec function specification for Python 3.0

In Python 3.0, exec is slated to become a function (instead of a statement). Currently, the presence of an exec statement in a function can cause some subtle changes since Python has to worry about exec modifying function locals. Guido suggested that the exec() function could require a namespace, basically dumping the exec-in-local-namespace altogether. People seemed generally in favor of the proposal, though no official specification was established.

Contributing thread:

PEP 3000 and exec

[SJB]

Adding opcodes to speed up self.attr

Phillip J. Eby experimented with adding LOAD_SELF and SELF_ATTR opcodes to improve the speed of object-oriented programming. This gained about a 5% improvement in pystone, which isn't organized in a very OO manner. People seemed uncertain as to whether paying the cost of adding two opcodes to gain a 5% speedup was worth it. No decision had been made at the time of this summary.

Contributing thread:

LOAD_SELF and SELF_ATTR opcodes

[SJB]

Dropping support for --disable-unicode

Reinhold Birkenfeld tried unsuccessfully to make the test-suite pass with --disable-unicode set. M.-A. Lemburg suggested that the feature should be ripped out entirely, to simplify the code. Martin v. Löwis suggested deprecating it to give people a chance to object. The plan is now to add a note to the configure switch that the feature will be removed in Python 2.6.

Contributing threads:

[SJB]

Bug in getitem inheritance at C level

Travis Oliphant discovered that the addition of the mp_item and sq_item descriptors and the resolution of any comptetion for __getitem__ calls is done before the inheritance of any slots takes place. This means that if you create a type in C that supports the sequence protocol, and tries to inherit the mapping protocol from a parent C type which does not support the sequence protocol, __getitem__ will point to the parent type's __getitem__ instead of the child type's __getitem__. This seemed like more of a bug than a feature, so the behavior may be changed in future Pythons.

Contributing thread:

Why does __getitem__ slot of builtin call sequence methods first?

[SJB]

Deferred Threads

Skipped Threads

Epilogue

This is a summary of traffic on the python-dev mailing list from October 01, 2005 through October 15, 2005. It is intended to inform the wider Python community of on-going developments on the list on a semi-monthly basis. An archive of previous summaries is available online.

An RSS feed of the titles of the summaries is available. You can also watch comp.lang.python or comp.lang.python.announce for new summaries (or through their email gateways of python-list or python-announce, respectively, as found at http://mail.python.org).

This is the 5th summary written by the python-dev summary taskforce of Steve Bethard and Tony Meyer (thanks Steve!).

To contact us, please send email:

Steve Bethard (steven.bethard at gmail.com)
Tony Meyer (tony.meyer at gmail.com)

Do not post to comp.lang.python if you wish to reach us.

The Python Software Foundation is the non-profit organization that holds the intellectual property for Python. It also tries to advance the development and use of Python. If you find the python-dev Summary helpful please consider making a donation. You can make a donation at http://python.org/psf/donations.html . Every penny helps so even a small donation with a credit card, check, or by PayPal helps.

Commenting on Topics

To comment on anything mentioned here, just post to comp.lang.python (or email python-list at python dot org which is a gateway to the newsgroup) with a subject line mentioning what you are discussing. All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on something. And if all of this really interests you then get involved and join python-dev!

How to Read the Summaries

The in-development version of the documentation for Python can be found at http://www.python.org/dev/doc/devel/ and should be used when looking up any documentation for new code; otherwise use the current documentation as found at http://docs.python.org/ . PEPs (Python Enhancement Proposals) are located at http://www.python.org/peps/ . To view files in the Python CVS online, go to http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/ . Reported bugs and suggested patches can be found at the SourceForge project page.

Please note that this summary is written using reStructuredText. Any unfamiliar punctuation is probably markup for reST (otherwise it is probably regular expression syntax or a typo =); you can safely ignore it. We do suggest learning reST, though; it's simple and is accepted for PEP markup and can be turned into many different formats like HTML and LaTeX. Unfortunately, even though reST is standardized, the wonders of programs that like to reformat text do not allow us to guarantee you will be able to run the text version of this summary through Docutils as-is unless it is from the original text file.