python-dev Summary, 2002-11-16 through 2002-11-30

This is a summary of traffic on the python-dev mailing list between November 16, 2002 and November 30, 2002 (inclusive). It is intended to inform the wider Python community of on-going developments on the list that might interest the wider Python community. To comment on anything mentioned here, just post to python-list@python.org or comp.lang.python in the usual way; give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration). All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on something. And if all of this really interests you then get involved and join python-dev!

This is the sixth summary written by Brett Cannon (back in my groove).

All summaries are now archived at http://www.python.org/dev/summary/ .

Please note that this summary is written using reStructuredText which can be found at http://docutils.sf.net/rst.html . Any unfamiliar punctuation is probably markup for reST; you can safely ignore it (although I suggest learning reST; its simple and is accepted for PEP markup). Also, because of the wonders of programs that like to reformat, I cannot guarantee you will be able to run the text version of this summary through Docutils as-is. If you want to do that, get an original copy of the text file.

Summary Announcements

Nothing to report to speak of. Uh, go to PyCon . =)

bsddb3 imported

Martin v. Loewis merged bsddb3 3.4.0 into CVS under the name bsddb. The old bsddb module is now no longer compiled by default; if it does get compiled, though, it ends up with the name bsddb185. Barry Warsaw also requested that the extensive testing suite be incorporated and "run it only with a regrtest -u option".

Martin wasn't sure how Barry wanted them incorporated, though, since there are multiple files to the test and most testing suites in the stdlib are a single file. Barry suggested that the testing files be put in a directory with the package and that test_bsddb.py just call the tests in that directory, much like how the email package does it. They were integrated and some errors and warnings were found that are being dealt with.

It was also agreed upon that development will be moved over to Python so as to keep the module in Python sync'ed up properly and to keep poor Martin from having to import the files into Python's CVS constantly.

Licensing question

David Abrahams asked about a licensing issue with Boost.Python (it is a free library that "enables seamless interoperability between C++" and Python) and it's modified Python.h file that it uses. Originally there was no license at the top of that file, but that does not work for some corporations using Boost. So David stuck his own license at the top and asked if this is the right thing to do.

Guido asked him to provide the PSF license at the top of the file and to mention what changes he made. The copyright had been added to the file for Python 2.2.2.

Re: PyNumber_Check()

M.A. Lemburg noticed that PyNumber_Check()'s semantics on what will cause it to return had changed. He asked if it should check whether one of "nb_int, nb_long, nb_float is available (in addition to the tp_as_number slot)". Guido responded that he would like to see it deprecated. We got a history lesson of how PyNumber_Check() was written "when the presence or absence of the as_number "extension" to the type object was thought to be useful". Regardless, Guido said that testing like this does not prove something is a "number" and if you wanted to test this way you could do it yourself.

In response, MAL said that perhaps PyNumber_Check() should be changed so that it returned true for something that is "usable as input to float(), int() or long()". Guido said that would be fine "as long as we all agree that that's exactly what they check for, and as long as we agree that there may be overlapping areas" for the various Py*_Check() functions. Guido later said testing for nb_int, nb_long, and nb_float was fine.

Plea: can modulefinder.py move to the library?

Just van Rossum wanted to move Freeze's modulefinder.py into the stdlib so that it can be distributed with binary releases. In case you don't know what modulefinder.py does, it attempts to find all pure Python module dependencies for a pure Python module. In other words, it checks what the module imports and if it is a Python file, and if it is, records that; it repeats this for all modules it finds, creating a listing of modules needed for the module to run.

Guido said that the module needed some work before it could be considered; it had print statements that were unneeded outside of Freeze and it had no documentation. Just agreed that the documentation needed to be done. As for the print statements, though, they only come out when debug is set to true; by default it is false. Guido said that was fine and agreed with the removal of the Windows-specific print statements.

Thomas Heller later said in another thread that patch #643711 was opened primarily for him and Just to do work in but that everyone was invited to help out.

Dictionary Foolishness?

Raymond Hettinger suggested having "dictionaries support the repetition" to allow one to create a dictionary with enough space as specified by the repetition:

>>> [0] * n   # allocate an n-length list
>>> {} * n    # allocate an n-element dictionary

Aahz recalled that dictionaries are resized upon adding to a dictionary and they could theoretically grow smaller. That would seem to possibly limit the usefulness of this idea. Guido then voted -1 (practically a death wish for an idea unless people clamor for it) saying that it relied too much on "arbitrary magic by side effect". He said if people really wanted this a method could be proposed.

dict() enhancement idea?

Just van Rossum suggested overloading the dictionary constructor so that arguments that went to **kwargs would be used to create the dictionary (this can be seen in the "Python Cookbook" as recipe 1.2 or online at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52313). This is desired because that means cleaner code for creating dicts:

>>> dict(pigs='!fly', birds='fly')

Barry commented that he liked it and had something similar in his code for Mailman . Thomas Heller voted +1 for it and also said that he used the idiom. Raymond Hettinger and myself also voted +1 for it.

Yet another string formatting proposal

Oren Tirosh proposed something (read the title to figure out what). He proposed the following syntax:

>>> "\(a) + \(b) = \(a+b)\n" 
>>> r"\(a) + \(b) = \(a+b)\n".cook()

The advantages, according to Oren, are that it would not require introducing the use of a new symbol like $, nor a new string prefix nor a new method. The .cook() method would be used to evaluate raw strings at a later point; it would draw arguments from the local and global namespace. The biggest drawback was unfamiliarity for programmers.

Frederik Lundh pointed out that \( is "commonly used to escape parentheses in regular expression strings" (Effbot wrote re , so he should know). Oren then said that curly braces could (and pretty will) be used instead.

Michael Chermside likes this design idea, but thinks the name for .cook() is not that great. Oren was going for a name that tied into "raw". Michael suggested the name .sub() to build off of the two PEPs already in existence covering string formatting (PEP 215 and PEP 292 ).

Expect in python

Eric Raymond proposed adding pexpect to the stdlib when it reaches version 1 (it is currently at 0.94). His thought was that having functionality like Expect would be a boon for Python and use for system administration. Eric said he had been using the module and had no problems with it (Prabhu Ramachandran also said it had worked for him).

David Ascher said that he would like to see a more abstract API to allow it work for things other than character streams. He also would like to see something work better on Windows. Eric said that he would not want to hold up this for hopes of getting something better since it already works well for what it does.

But it appears that the creator of pexpect is more than willing to help maintain the module if it makes it into the stdlib.

Zach Weinberg said that he would be willing to put some work into making the pty module more portable since pexpect does its thing using pty.

PEP 288: Generator Attributes

Raymond Hettinger has revised PEP 288 with a new proposal on how to pass things into a generator that has already started. He has asked for comments on the changes, so let him know what you think.

PyMem_MALLOC (was [Python-Dev] Snake farm)

Continuation of http://mail.python.org/pipermail/python-dev/2002-November/029853.html

There was a possible issue with PyMem_MALLOC() that Marc Recht had discovered on FreeBSD. It eventually was tracked down to FreeBSD-CURRENT's implementation of malloc(): malloc(0) always return 0x800. M.A. Lemburg suggested changing a test in the configure script to try to catch when a platform returned an address for malloc(0) and treat it just like when it would return NULL (NULL can't be blindly returned since that would signal a memory error; returning NULL in a C extension signals an error). Marc came back with news that C99 says that this is legitimate behavior for malloc() so this could possibly affect other platforms.

Marc suggested that PyMem_MALLOC() just be redefined to n ? malloc(n) : NULL. Problem is that the NULL issue mentioned above comes into play with this solution. Tim Peters suggested either malloc(n || 1) or malloc(n ? n : 1) (the former being a Python idiom that doesn't cut it in C). he does not want to mess with the configure scripts since they have "proven itself too brittle too many times". Tim wanted a way to prevent ever calling the function with 0, but Guido couldn't see any way of doing that without an extra jump.

The committed solution is malloc((n) ? (n) : 1). Easier to just waste one byte then have to deal with the special casing of passing 0. The extra test was not really a worry since no measurable performance reported by Tim. Besides, Tim pointed out "this is ideal for a conditional-move instruction, and more architectures are growing that".

Half-baked proposal: * (and **?) in assignments

Gareth McCaughan suggested cutting down one of the separations between parameter passing and assignment by allowing assignment to use arbitrary argument lists:

>>> a,b,*c = 1,2,3,4,5  # c == (3, 4, 5)
>>> year, month, day, *dummy = time.localtime()

I argued that I didn't like the slightly cluttered look on the left-hand side (LHS) of the assignment. Martin v. Loewis and I basically ended up saying we wanted to keep assignments clear and concise and that this would not help to keep that. Steve Holden basically ended up agreeing.

Brian Quinlin, Patrick O'Brien, Nathan Clegg and Timothy Delany liked the idea. The biggest argument in support was that it would allow for a more functional programming style (and that obviously can be good or bad depending on your P.O.V.; I say bad =):

>>> car,*cdr = [head, t1, t2, t3]  # car == head, cdr == (t1, t2, t3)

In case you don't have functional programming (especially Lisp/Scheme) experience, the basic data structure in Lisp-like language is a list and the most common way to manipulate those lists is with the functions car and cdr. car returns the "head", or front, of the list; cdr returns the "tail", or everything but the head, of the list. This allows for simple recursion since you just pass the cdr of a list on the recursive call after having dealt with the head of the list.

There was also the suggestion of allowing the arbitrary assignment variable to be anywhere in the list of assignment variables:

>>> a,*b,c = 1,2,3,4,5  # a == 1, b == (2, 3, 4), c == 5

To prove that this was not really needed I wrote a function that took in an iterable and the number of variables to assign to and then returned the proper number iterations on the iterator and then the iterator as the last thing returned. Alex Martelli of course improved upon it (and also continued to correct my slightly incorrect statements):

def peel(iterable, arg_cnt=1):
        """Return ``arg_cnt`` values from iterator of ``iterable`` and then the iterator itself."""
        iterator = iter(iterable)
        for num in xrange(arg_cnt):
                yield iterator.next()
        yield iterator

The idea of a module for the stdlib containing iterator helper functions was suggested by Alex. One is in progress by Raymond Hettinger.

Armin Rigo suggested having iterators become a type. That was quickly shot down, although having the suggested iterator helper module contain a class that could be subclassed by iterators was received with positive comments.

The thread ended very quickly after Guido said that he didn't think "that there's a sufficient need to add new syntax".

from tuples to immutable dicts

Armin Rigo said that he would like to have an immutable type that acted like a dictionary; basically like a struct from C. Martin v. Loewis agreed on the need, but opposed the idea of adding another built-in type or syntax for such a type; that left something for the stdlib. Martin suggested something like:

>>> struct_seq(name, doc, n_in_sequence, (fields))

where field is a bunch of (name, doc) tuples. What would be returned would be a "thing [that] would be similar to os.stat_result: you [can] call it with the mandatory fields in sequence, and can call it with the optional fields by keyword argument".

Armin didn't like it since it went against his initial proposal "which was to have a lightweight and declaration-less way to build structures". He basically ended up suggesting something along the lines of tuples with keyword arguments. Martin didn't like it since he didn't see a great use for it.

In the end Armin said to just drop the idea.

urllib performance issue on FreeBSD 4.x

Andrew MacIntyre brought a thread on python-list to python-dev's attention about urllib performance compared to wget (wget is used to download web sites and files). Apparently the used socket is unbuffered instead of using the system default (which was shown to be almost as fast as wget). The question became why this was done.

The answer (thanks to Martin v. Loewis) was to prevent deadlock. Apparently under HTTP 1.1 a server can keep a connection open while waiting for the next command. If the connection was buffered it would block until it read enough to fill the buffer which may never come.

Frederik Lundh suggested that a subclass or option be available that allowed the choosing of unbuffered or not. Andrew said he would put it on his todo list.

test failures on Debian unstable

Failures on the build of Debian's unstable version of Python led to a discussion about how modules are skipped in the testing suite. Lib/test/regrtest.py keeps a list of tests that are expected to be skipped on various platforms. Martin v. Loewis doesn't like it because tests such as for the bz2 module are attempted regardless of whether the bz2 library is even installed and yet it is expected to succeed on Linux. Martin summarized that "For many of the tests that are somtimes skipped, knowing the system does not tell you whether the test will should rightfully be skipped, on that system" since tests are skipped often because a module was not there that needed to be imported for the test.

Tim Peters, on the other hand, likes it. Since he maintains the Windows distribution from PythonLabs he likes it since it lets him know when new things have been added to Python and might need to be excluded from the Windows distro. Neil (who pointed out the Debian problems) was able to recognize that the tests that failed were meant to pass under Linux. Tim admitted he only cared about keeping the mechanism for Windows; he could care less if it is removed for Linux.

Patrick O'Brien chimed in (with Aahz supporting) that the feature is handy since you can easily find out libraries you are missing that you could potentially install.

Guido stepped in and suggested setting up a mechanism that would allow an external table in a file to be used when present instead of the default list of tests to skip. Don't think anyone has stepped up to implement this.

Currently baking idea for dict.sequpdate(iterable, value=True)

Raymond Hettinger presented "a write-up for a proposed dictionary update method". It basically took an iterable and added keys based on the values returned by the iterator with a value as passed in and used for all new keys. The rationale was to have a fast way to be able to do membership testing using dict's __contains__ or removing duplicates by creating a dict and then outputting the keys using the aptly named .keys().

Previous objecctions to something like this were about the dict constructor and the sets module. The ones about the constructor are dealt with by making this a method. The latter was argued against by saying that the sets module is slow. Frederik Lundh brought up that we really don't need multiple ways of doing the same thing. Just van Rossum agreed and said this killed the idea for him. Guido chimed in and said that the sets module was to help solidify the sets API so that at some point it could be coded in C.

To address the speed complaint Guido suggested limiting the sets module initially to make it faster so that the type won't be held back or unutilized because of its speed. Tim Peters spoke up, though, and said that the spambayes project used sets and he didn't have any complaints. But when major membership testing was needed a dict was used. And Tim pointed out that in order for any C sets code to be fast it would have to directly use dict's C __contains__ code.

What this method should return was brought up by Just. Some thought None since .update() returns that. Others said True. Guido said None since True should only be used when something is explicitly true.

Making it a class method was also suggested by Just as an easy way to make it like a constructor. Raymond agreed and changed his proprosal as well as to have the method be named .fromseq(). But then Walter Dorwald said .fromkeyseq() should be used since there "is another constructor that creates the dict from a sequence of items". Guido voted +1 on that idea.

Re: release22-maint branch broken

Tim Rice discovered that trying to build Python from a directory other then where the source was did not work for the Python 2.2.* CVS. It was all eventually solved and fixed in the CVS branch. I am mentioning it here in case someone reading this had a similar issue.

Dictionary evaluation order

Gustavo Niemeyer asked about how to handle code like {f1():f2(), f3(): f4()} and its execution order as pointed out by bug #448679 . As it stood it evaluated in the order of f2, f1, f4, f3. Apparently Guido once upon a time considered this a bug.

But Guido mentioned that left-to-right evaluation is not always wanted since a = {}; a[f1()] = f2() would want f2 to evaluate first. He asked what Jython did.

Finn Bock said that Jython went f1, f2, f3, f4. In that case Guido didn't see any reason to block the fix. But Tim Peters brought up the point that the bug was more about the lack of specifics on this in the documentation. Gustavo said he would make the code fix along with patches to the docs.

int/long FutureWarning

Mark Hammond asked how the upcoming change in Python 2.4 of hex/oct constants will affect his C extension code and something like PyArg_ParseTuple() (this function takes arguments passed to something and breaks it up into its individual parts since all arguments are passed as tuples in C code). In case you don't know about the warnings, Python 2.3 warns you that code like SOMETHING = 0x80000000 could have a different meaning in Python 2.4; most likely it will be treated as a positive long. You can currently get rid of the warnings by changing the constant into a long by tacking on a L to the end of the number.

Martin v. Loewis that if Mark appended the L to his constants that it would not work for an i argument for PyArg_ParseTuple(). But Guido stepped up and said that there will be no issue since Python will be changed so that Mark's code will accept the constant as a positive long. This caused Guido to wonder if the warning could be changed to some other warning that is not normally printed out.

Guido then mentioned that he has "long promised a set of new format codes for PyArg_ParseTuple() to specify taking the lower N bits (for N in 8, 16, 32, 64) and throwing the rest away, without range checks". "If someone else can get to this first, that would be great". So someone be nice to Guido and do this for him. =)

Either way no specific resolution has been reached. As of right now you can just live with the warnings, supress the warnings, or change your constants to longs and hope you are not passing into a C extension function that wants an int.

assigning to new-style-class.__name__

Michael Hudson has been working on patch #635933 to allow for assignment to __name__ and __bases__ for new-style classes (this was all so that __name__ would handle nested classes properly to allow for proper pickling; that thread was called metaclass insanity ). He ran into a slight issue with dealing with assigning to __name__. To get it working, Michael wanted to treat heap and non-heap types differently. For non-heap types Michael wanted to "everything in tp_name up to the first dot is __module__, the rest is __name__". For non-heap types, he wanted to have __module__ as "always __dict__['__module__'], __name__ is always tp_name (or rather ((etype*)type)->name)". And as for the issue of if someone is crazy enough to delete the dict key of __module__, Michael said Python wouldn't crash but you probably would not like the outcome of running code. =)

Guido responded saying that Michael's proposal was acceptable.

But then there was an issue with .mro() after the bases had been rearranged. Michael worried about what to do when there was a conflict down the intheritence tree. He thought reverting back to the way things were if there was an issue was best. This would require keeping around copies of the previous states until the changes propogated all the way through.

Samuele Pedroni stepped in to try to answer this question (Samuele rewrote the MRO code recently and is directly mentioned in C3 implementation ). He came up with a possible case where there could be a possible order disagreement if two of the bases of a class had the same bases but one had the order swapped compared to the other (so C has bases of (A, B) and D has bases of (A, B) as well and E had bases (C, D); if C's bases became (B, A), E now has an order disagreement). He suggested that "the mros of the subclasses should be computed lazily when needed (e.g. on the first - after the changes - dispatch), although this may produce inconsistences and errors at odd times".

Michael showed that his solution would catch the problem. But he did not like the idea of lazily evaluating; he wanted a more restrictive solution since this is a new thing. Michael stated that what he wanted this for was to "to swap out one class for another -- making instances of the old class instances of the new class, which was possible and making subclasses of the old subclasses of the new, which wasn't". It also turned out neither APL nor Dylan allow this kind of thing so Michael is breaking new ground.

Samuele asked about when the classes had solid bases (i.e., only a single superclass such as object). Michael said it would handled with no problem.

Classmethod Help

Raymond Hettinger emailed the list because Guido said that the few people in the world who understand descriptors for C code are on the list. The main reason I am mentioning the thread here, though, is because Armin Rigo gave the answer that "There are METH_CLASS and METH_STATIC flags that you can set in the tp_methods table".

You also learn, thanks to Guido, that you should only use PyErr_BadInternalCall() when you know that a "bad argment must have been created by a broken piece of C code".