This is a summary of traffic on the python-dev mailing list between September 01, 2002 and September 15, 2002 (exclusive). It is intended to inform the wider Python community of ongoing developments on the list. To comment on anything mentioned here, just post to email@example.com or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP (or anything else for that matter) if you have an opinion.
This is the second summary written by Brett Cannon (hopefully my sophomoric performance will be better then most sophomore music albums).
Please note that this summary is written using reStructuredText which can be found at http://docutils.sourceforge.net/rst.html . Any unfamiliar punctuation is probably markup for reST; you can safely ignore it (although I suggest learning reST; its nice and is accepted for PEP markup). Also, because of the wonders of reformatting thanks to whatever you are using to read this, I cannot guarantee you will be able to run this text through DocUtils as-is. If you want to do that, get the original text from the archive.
I am considering keeping a list of names that people are often referred to in emails. This would serve a dual purpose: allows people who read emails from the list to have a reference to be able to figure out who is who and makes the summaries easier for me because I can then make reference to people by the names I know them by. =) Any comments on this idea are appreciated.
Walter Dorwald asked if there were "any objections against committing the patch" for implementing PEP 293 (Codec Error Handling Callbacks). Guido asked what Martin V. Lowis and M.A. Lemburg had to say about it. MAL responded that he was +1 on the patch. Martin was "concerned about the massive amounts of C code, most of which could be expressed way more compact in Python code", but "Walter convinced [MvL] that this does have a real performance impact for real data" so he would live with it. In the end he gave it his vote.
Walter said he would check it in (and he has). The PEP has now been moved to the finished PEP list.
Raymond Hettinger suggested adding mixin classes that automatically implement magic methods when certain basic magic methods were already implemented (e.g., "given an __eq__ method in a subclass, adds a __ne__ method"). David Abrahams said that he thought "these are a great idea, in the context of an understanding of what we want interfaces to be, say, and do." Guido brought up some points about the initial suggestions Raymond made. He then said that he thought that there wasn't "enough here to warrant putting this into the standard library"; the issue will be revisited when a standard type or interface hierarchy is added to Python (not in 2.3).
Jeremy Hylton wrote some threaded code to fetch some web pages that hung when performing a slow DNS operation. Apparently, in Python 2.1 "it produces a steady stream of output -- urls and the time it took to load them". In Python 2.2 and 2.3, though, "it produces little bursts of output, then pauses for a long time, then repeats". Jeremy guessed that it might have something to do with Linux's getaddrinfo() being thread-safe by allowing only a single lookup at a time. Aahz said that "gethostbyname() IIRC has frequently been non-reentrant".
Originally, when an exception was raised and you passed in an optional object to act as a description of why the exception was raised (such as KeyError("there is no spoon") where there is no spoon is the optional argument bound to <exception>.args), it just returned what args was bound to when you called; str(<exception>) == <exception>.args. Now it calls repr() on what args is bound to; str(<exception>) == str(<exception>.args). Much better. =)
Thanks to great work done by Tim Peters and several other contributors, Barry Warsaw started an SF project to host the spambayes code. It can be found at http://sf.net/projects/spambayes . There are two mailing lists: http://mail.python.org/mailman-21/listinfo/spambayes and http://mail.python.org/mailman-21/listinfo/spambaye-checkins (yes, that is Mailman 2.1, and yes, you will "help be a guinea pig for Mailman 2.1").
Martin V. Lowis wanted to introduce subsecond timestamps on platforms that supported it. He suggested adding another field to stat, create a new type, or make st_mtime a floating point. The first one option is easy, the second has the usual problems of defining a new type, and the third does not guarantee enough accuracy.
Paul Svensson and Guido said that the last option (turning st_mtime into a float) was the most Pythonic. MvL agreed, but worried about breaking code that expected an int. Guido then suggested that maybe the new field is the way to go; define something like st_mtimef that will contain the float if available or contain an int otherwise. Tim Peters also weighed in with his IEEE 754 voodoo about how a float can hold enough info to be accurate up to 100 nanoseconds if you only span a 33 years. That causes an issue starting in 2003 since that is 33 years past the epoch (1970).
But then MvL discovered that st_mtime was already a float on the Mac; had that caused issues? Jack Jansen of course chimed in on this by saying that it caused him a headache about once a year in the form of a failing test (other issues caused by timestamps is the Classic Macs having the epoch at 1904 and not using UTC time). He said he would prefer to see the timestamp as a cookie that was passed into a function that spit out "something guaranteed to be of your liking".
To address the other issues that Jack mentioned, Guido suggested that all timestamps be converted to UTC time with the epoch at 1970.
MvL has SF patch 606592 up on SF that has already been closed that makes all the relevant changes to have timestamps return floats.
Bob Ledwith posted a simple patch for Include/object.h that changed the order of certain parts of the PyObject_HEAD macros, affecting PyObject and PyVarObject. This was for a 64-bit platform performance boost (40% for large data sets according to Bob). The reordering eliminated some padding in the struct and allows more Python objects to fit in the L2 cache, or at least that is what Bob thinks is going on.
Guido pointed out that this would save 8 bytes per object; he thought all of this was "Interesting!". But alas, using this patch would break binary compatibility. Guido was not sure, though, whether it had been broken yet between Python 2.2 and 2.3 and thus he might be "being too conservative here" in terms of saying that it should be held back for now.
A problem Guido pointed out for 64-bit systems, is that theoretically the reference count for an object could go negative with enough references as things stand now. Guido then suggested that perhaps refcnt (struct item that holds the reference count) should be a long. And while dealing with that, Guido suggested that anything that stores a length should store that number in a long.
Chime in Tim Peters. He pointed out that it was agreed upon years ago to move refcnt to long but no one had bothered to do it. Heck, even Guido thought for a long time that it was a long when it wasn't; it required Tim to "beat that out of [Guido] <wink>" to stop him from saying that it was a long. He then pointed out that Win64 was still only 4 bytes for a long; what was really desired was for it to be Py_intptr_t which is the Python way for spelling the C99 type that we wanted. Apparently C99 has a way to specify that things be a specific byte length (now if everyone just had a C99 compiler we wouldn't need these macros; oh, to dream...).
Tim also pointed out that what we wanted for the type that held a length argument to be size_t since that is what strlen() and malloc() are restricted by. He said that he writes all of his "string-slinging code as using size_t vars now".
Tim pointed out that the issue then became "Whether it's worth the pain to change this stuff" which "depends on whether we think 64-bit boxes are just another passing fad like the Internet <wink>". =)
Martin V. Lowis agreed with the changing of refcnt to a long but had reservations about using size_t for the length field (ob_size). He pointed out that some objects put negative values into that field.
Frederik suggested that the proposed changes be default on 64-bit systems since the chances that they are willing to recompile is higher then people on 32-bit systems. He also suggested making it a compiler option. Guido thought it was a good idea. But then Mats Wichmann discovered that the switch to long killed the performance boost. So Guido re-iterated that he thinks it should be a compiler option only on 64-bit systems; have "compat", "optimal", and "right" compiler options.
As of yet nothing has done about this.
Jack Jansen noticed that there demos for some of the SGI-specific modules that use severely outdated systems and hardware (stuff discontinued 8 to 12 years ago). Guido gave the go-ahead to yank them from CVS.
This has yet to be done.
(This thread actually started in August) There was a bug in Python 2.2 that raised a UnicodeError when trying to decode a lone surrogate (explanation of surrogates to follow this summary). This caused issues in importing .pyc files that contained a lone surrogate because marshal (which is what is used to create .pyc files) encodes Unicode literals in UTF-8. This has all been fixed in Python 2.3, but Guido was wondering how to backport this for Python 2.2.2.
The option of bumping the magic number for .pyc files was raised and instantly thrown out by Guido; "Bumping MAGIC is a no-no between dot releases". So M.A. Lemburg suggested to either fix the Unicode encoder or change the Unicode decoder to handle the malformed Unicode. MAL wasn't sure, though, if some security issue would be raised by the latter option.
Guido said go for the latter and didn't see any possible security issue since "If someone you don't trust can write your .pyc files, they can cause your interpreter to crash by inserting bogus bytecode".
Explanation of lone surrogates:
Christopher Craig noticed that the docs for the re module for the \b metacharacter was incorrect; it says that "the end of a word is indicated by whitespace or a non-alphanumeric character". That would indicate that an underscore would be the end of a word, which turns out to be false. Frederik said that "b is defined in terms of w and W" and thus allows underscore to be a alphanumeric character. The documentaiton has been fixed.
Francois Pinard discovered that for the codecs module "one should be careful about not [altered emphasis] naming a module after the encoding name, when closely following the documentation in the Library Reference manual". This is because the codecs module first searches the registry of codecs, then searches for a module with the same name and use that module. The issue comes up when the module does not contain a function named getregentry(); "`encodings.lookup()` expects a `getregentry` function in that module, does not find it, and raises a CodecRegistryError, not leaving a chance to subsequent codec search functions to be used".
M.A. Lemburg said that this has been fixed in Python 2.3 and will be in 2.2.2 by having encodings.lookup() return None if getregentry() is not found and thus allowing the search to continue.
But the reason I am summarizing this is what this thread quickly changed to is how to properly generate a patch. Patches should be generated using UNIX diff, either the -c or -u option with preference for -c (using cvs diff -c is even better; puts the version of the file you are diffing with in the output); Mac folk can send MPW diffs, but UNIX diff is the definitely preference. Always put the order of the files diff -c OLD_FILE NEW_FILE . And always post the patches to SourceForge! Getting random patches, no matter how small, on the list is annoying (at least to me) because the point of the list is to discuss the design and implementation of Python, not to patch Python. SF is used so that Python-dev does not need to be bothered with mundame problems like applying patches (and to annoy Aahz with SF's UI sucking in Lynx =). So please, for my sake and everyone else on Python-dev, use SF!
For a funny email from Raymond Hettinger about developing for Python read http://mail.python.org/pipermail/python-dev/2002-September/028725.html .
Aahz asked "why wouldn't we simply use attributes to hold" interfaces that a class implemented (think of __slots__). David Abrahams then brought up the idea of just adding interfaces to the __class__ attribute.
Guido then chimed in on the attributes idea. He pointed out that this is how Zope does it, using the __inherits__ attribute. The limitation is that "it isn't automatically merged properly on multiple inheritance, and adding one new interface to it means you have to copy or reference the base class __inherits__ attribute". And as for David's idea of just adding to __class__, that doesn't work because there is no way to limit the interface; you need "Something like private inheritance" for when an interface is broken by some inherited class. David subsequently added the issue of being able to disinherit when an interface is not valid but is inherited by default as another problem for using inheritence for interfaces.
David then brought up the issue of having Python being so dynamic that you could inject an interface if you used __class__ like he suggested through black magic code. If the injected interface didn't work because of the inheritence chain, then you have a problem.
Barry Warsaw brought in his objections. He tried playing Devil's Advocate by saying that Guido had said that inheritance would not be the only way to handle interfaces, but that it would be the predominent way. But this duality would complicate any conformsto()-like function since it would have to handle two different ways for a class to get an interface. Barry then brought up the objection that he didn't like the idea of using straight inheritence because he wanted a syntactic way to separate out interfaces.
As a side note, Guido pointed out that __slots__ is provisional; nicer syntax will eventually surface when Guido gets over his "fear of adding new keywords".
Christian Tismer has come up with a replacement for the etype which is "a hidden structure that extends types when they are allocated on the heap" (you can find it in Objects/typeobject.c in the CVS). There is a limitation with the etype where it could not be extended by metatypes. Well, Chris worked his magic and came up with a new flextype that allows overriding of methods. So with Christian's code you would be able to override methods in a type without having to hack something together to handle the overriding correctly; it would be handled automatically.
Through some clarification from Christian and Guido, it was pointed out to me (as of this moment I am the only one to make any noise on this thread, and it was for this summary) that this simplifies an esoteric issue; note the use of the words "metatype" above. This is type/metatype black magic hacking. Spiffy, but something most of us "normal" folk will not have to worry about.