This is a summary of traffic on the python-dev mailing list between August 16, 2002 and September 1, 2002 (exclusive). It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list@python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion.

This is the first summary written by Brett Cannon. Summaries are archived no where at the moment. =) They will be, though, so stay tuned for the URL in future summaries.

Type Categories

This VERY long thread was sparked by Andrew Koenig asking if a discussion of making type categories more explicit had ever occured (Andrew meant for category to mean "the set of all types that implement a particular marker interface"). As Andrew later pointed out, he was asking about "a way of making notions such as 'file-like object' more formal and/or automatic". The discussion quickly started using the term interface to mean defining a way to specify that an object implemented certain methods (think of it in terms of Java's 'implements' mechanism).

Once definition was out of the way, the discussion took off. Zope's implementation was pointed out ( http://cvs.zope.org/Zope3/lib/python/Interface/ ) very quickly. PEP 245 (Python Interface Syntax) was also brought to the attention of the list. The idea of using inheritance to handle interfaces was brought up. Guido said that he hasn't "given up the hope that inheritance and interfaces could use the same mechanisms. But Jim Fulton, based on years of experience in Zope, claims they really should be different" in terms of how interfaces should be handled in objects. Jeremy Hylton tried to channel Jim's opinion by pointing out that "We'd like to use interfaces to make fairly strong claims. If a class A implements an interface I, then we should be able to use an instance of A anywhere that an I is needed." But "the inheritance mechanism is too general" because if a class A implements interface I and then a class B, which does not implement I, subclasses class A we end up with a class B that claims it has a certain interface which it doesn't actually have. Guido understood the point, but still thought inheritence could be used "if there was a way to "shut off" inheritance as far as isinstance() (or issubclass())" is concerned. Guido asked the simple question, "Why do keep arguing for inheritance? (a) the need to deny inheritance from an interface, while essential, is relatively rare IMO, and in most cases the inheritance rules work just fine; (b) having two separate but similar mechanisms makes the language larger."

Samuele Pedroni asked that any implementation "allow also for refering to anonymous super-interfaces of an interface in terms of the interface plus a subset of its signatures, also e.g. FileLike and just 'write'. [that means an interface can be thought to correspond to a set of (tag,signature) tuples, where tag identifies the interface, and one can also just consider subsets of it]".

The thread has finally seemed to have stopped (for now) with Guido saying he is mulling the whole thing in the back of his head. This is a very sticky topic because of the number of design decisions required and how it might change the way people program in Python.

There was also a partial sub-thread in this whole discussion about multimethods; basically a way to do overloading of methods based on parameter signature. Most of the discussion was over syntax and such and how to handle resolution order. It then seemed to go to the wayside when the main part of the thread took over again.

type categories -- an example

This thread was starteed when Andrew Koenig said that the reason he brought up his type category question was because he wanted a way so as to be able to identify members of a type easily. He now had an example in a program he was writing where what the type of the argument was varied and thus what needed to be done to the data changed accordingly. Jermey Hylton suggested the isinstance(obj, type(re.compile(''))) idiom. Andrew asked if this was guaranteed to work, which Jeremy said no. I asked why this was not guaranteed, and Frederick Lundh said because re.compile() is a factory fxn and it is possible that a future version could return a different object based on the pattern.

Python build trouble with the new gcc/binutils

Andrew Koenig said that he couldn't compile Python using the newest gcc (this was the day after the latest release hit servers). With help from Zack Weinberg of Code Sourcery (who also recently rewrote the tempfile module), the problem was tracked down to binutils 2.13. being the culprit and was not Python's fault.

Last call: mortal interned strings

The patch http://python.org/sf/576101 removes the default immortality of interned strings. I believe it was in early August (possibly spilled over from late July) when Oren Tirosh proposed the idea and wrote the above mentioned patch. There had been some discussion over whether any 3rd party code was reliant upon interned strings being immortal; none was found (MacPython was reliant upon it, but since it is under Python core control it was considered a moot point since it could be changed). It has been checked in. With the patch the way to make a string immortal is to call PyString_InternImmortal(); no code in the core uses this function.

PEP 218 (sets); moving set.py to Lib

Thanks to Greg Wilson (for writing the PEP), Alex Martelli (for writing the module initially), Guido (for refactoring Alex's code), Raymond Hettinger (for writing the docs and playing with the code himself), and Tim Peters (for some speedup code) the stdlib has now gained a sets module. It has both the notion of mutable and immutable sets (the latter used when you have a set of sets). There was discussion about how sets should print (sorted or not; unsorted was chosen) and what operators should be overloaded for working on sets (| and & were chosen). The module is a beautiful chunk of code and I highly recommend reading its source.

A few lessons from the tempfile.py rewrite

Zack Weinberg, after rewriting the tempfile module, brought up three points: 1) Lack of dummy threads, 2) lack of a pthreads_once equivalent, and 3) lack of a way to skip tests from unittest.py via some built-in method. Guido responded accordingly: 1) since some code uses the idiom of trying to import thread and catching the exception if it fails, Guido said he would be willing to accept a dummy_thread.py that would allow:

try:
import thread as _thread
except ImportError:
import dummy_thread as _thread

to work. No word on whether this is being written at the moment. 2) Guido said the method was, in his opinion, overkill. He said to "be Pythonic, live dangerously, accept the risk that a ^C can screw you. It can anyway. :-)". And as for 3) Guido deferred Zack to the PyUnit list and Steve Purcell since Python just tracks Steve's code (pyunit.sf.net). Guido's suggestion was to stick code that was reliant on some other code in a separate testing suite that is only run when the reliant code is available.

Standard datetime objects?

Kevin Jacobs asked what stage the new datetime object was at. Guido said it is in python/nondist/sandbox/datetime/ in CVS which also has comments pointing to a wiki containing the current work on it. Fred L. Drake, Jr. is working on the C re-implementation and has now checked it into the sandbox in CVS.

PEP 269 versus 283

Jonathan Riehl noticed that PEP 283 said PEP 269 was dead; not good considering he was close to having a patch for PEP 269 (pgen module to interface with the C version). Guido said he will revive the PEP. The patch has since been put on SF at http://python.org/sf/599331 .

What is a backport candidate?

Since Python 2.2 is going to be around for a long time, the question was brought up of what constitutes code that should be backported. Guido made the following three points:

  1. code trivial to backport should always be backported
  2. code patching 2.3 code should obviously not be backported
  3. 2.2 code requires changes to use patch, but applies; gradients of this exist.

So please, when submitting patches, mention whether you think the patch should be backported to the 2.2 tree and any possible dependencies it might have in a backport.

python/nondist/sandbox/spambayes

In response to Paul Graham's spam filter written using Baye's Rule (Slashdot post on it is at http://developers.slashdot.org/article.pl?sid=02/08/16/1428238&tid=156), a thread spawned around this checkin of code that followed that paper's suggestions. This thread quickly jumped into discussions on data structures, Baye's Rule, and a whole lot of talk about spam. Very interesting if spam filtering interests you. Tim Peters has been leading the drive on this chunk of code (and thanks to his illness that befelled him in late August which he has subsequently gotten over he had a few days of major hacking on it; Tim showed he is a performance stats whore <wink>).

A very cool quote came out of this thread from Eric S. Raymond when discussing the spam filter he has been working on: "This is actually the first new program I've coded in C (rather than Python) in a good four years or so".

Parsing vs. lexing.

In response to a question by Aahz about what the differences were between a lexer, parser, and tokenizer, Eric Raymond posted a good overview of the differences. Guido later commented in an email mentioning SPARK and about how Python's lexer (pgen) works and why he wrote it. He also made some other comments on lexers. Jeremy Hylton pointed out a "neat new paper about an old algorithm for recursive descent parsers with backtracking and unlimited lookahead" by Bryan Ford at http://www.brynosaurus.com/pub.html . Alex Martelli pointed out that this discussion reminded him of "a long-ago interview with Borland's techies" in which they said they were able to make Borland PASCAL fit on a floppy while MS PASCAL took multiple floppies. Their trick was "we just did everything by the Dragon Book -- except that the parser is a hand-written recursive descent parser [Aho &c being adamant defenders of Yacc & the like], which buys us a lot". Someone named Noah also emailed a discussion on lexers and parsers pulling in Finite State Machines, Push Down Autonoma, and Turing Machines in his discussion.

Martin Sj?n says that Haskell's pattern matching and lazy evaluation makes lexers easy (even a Recursive-Descent parser), but unfortunately Haskell does not play with other languages nicely. Haskell is where Python got it's list comprehension idea.

[Python-Dev] Fw: Security hole in rexec?

It was brought to the attention of the list that deleting __builtins__ allowed a compromise in rexec. Guido pointed out that http://python.org/sf/577530 reports this. He also said don't trust rexec.

A patch was submitted and checked in to document the view that rexec is really not that safe.

A `cogen' module

Francois Pinard asked about Cartesian products using the new sets module. Guido didn't think people would in general need it. Francois quickly started this thread of discussing a cogen module to generate Cartesian products and other ways of operating on sets. The thread quickly died when Tim Peters posted "his elaborate state-of-the-art code", as Guido called it. But Francois said he would be back for more discussion on this.

Mersenne Twister

Raymond Hettinger volunteered to implement the Merseene Twister algorithm (one in Python exists at http://www.math.keio.ac.jp/~matumoto/emt.html ). While discussing to implement in C or Python, Guido noticed that random.Random re-implements whrandom. Guido then came up with the idea of writing a base random class that is subclassed where .random() can be implemented; Tim Peters agreed and suggested more methods to subclass.

New PEP Format: reStructuredText

David Goodger and Barry Warsaw have now gotten reST as a usable syntax for PEPs. Read the PEPs on the subject to learn more:

It has been suggested that the summaries try using reST; I am considering it.

tiny optimization in ceval mainloop

Jeremy Hylton noticed that in ceval that their is a test of whether the ticker was 0 or if things_to_do was set to true (explanation of the ticker, checkinterval, and the GIL follow this paragraph). Jeremy wondered if we could just drop the ticker to 0 when things_to_do is true. Jack Jansen, though, pointed out that clearing it is not guaranteed since there may be an interrupt routine when "we fiddle things_to_do". Skip Montanaro then pointed out that since neither ticker nor things_to_do is fiddled with unless the GIL is held that instead of causing each thread to execute this test that they could be made globals instead; he did a patch that implements this ( http://python.org/sf/602191 ). Guido then said that if there wasn't a decent speed improvement, then no patch would be checked in. He then changed his mind when it was pointed out that it actually simplified the code. Skip tested anyway, though, and there is a speed improvement. This also brought up whether the default value of 10 for checkinterval was reasonable. It was then agreed to be bumped up to 100. Jack ran some code and said he noticed a definite improvement.

Python's version of threading is not like in C. There is something called the GIL (Global Interpreter Lock) which any thread wishing to execute Python code or play with Python objects must hold. This means that when you have Python threads running (using the thread or threading module) they are usually all waiting in line to get the GIL. Now for Python to decide when to release the GIL for another thread to grab it, it uses the ticker. This variable counts down to zero by being decremented every time a Python opcode is executed (originally defaulted to 10, now defaulted to 100). The ticker's starting value after each release of the GIL is what sys.checkinterval() sets.

To get a better understanding of therading under Python I recommend reading Aahz's tutorials on threading.