This is a summary of traffic on the python-dev mailing list between October 1, 2002 and October 13, 2002 (inclusive). It is intended to inform the wider Python community of on-going developments on the list that might interest the wider Python community. To comment on anything mentioned here, just post to python-list@python.org or comp.lang.python in the usual way; give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration). All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP (or anything else for that matter) if you have an opinion. And if all of this really interests you then get involved and join Python-dev!

This is the fourth summary written by Brett Cannon (with a partially fried brain thanks to the GRE).

All summaries are now archived at http://www.python.org/dev/summary/.

Please note that this summary is written using reStructuredText which can be found at http://docutils.sourceforge.net/rst.html . Any unfamiliar punctuation is probably markup for reST; you can safely ignore it (although I suggest learning reST; its nice and is accepted for PEP markup). Also, because of the wonders of reformatting thanks to whatever program you are using to read this, I cannot guarantee you will be able to run this text through Docutils as-is. If you want to do that, get the original text version.

Summary Announcements

This is a new section to the summary that I have decided to introduce. It is mainly going to serve to make any general announcements or comments on this summary and this summary alone. All universal comments will stay at the top of the summaries.

Just to let everyone know, I am taking off for two weeks on vacation starting 2002-10-14 and I will not return until 2002-10-30. Now, before you all start sobbing over the loss of one of my great summaries, you should know that Raymond Hettinger has graciously taken up the job of temp summarizer for me and will do the summary while I am gone.

Michael Hudson has made the suggestion that I inject more of my personality into the summary so as to liven it up a little. I am personally quite happy to do this. But the real question is do you, fine reader, mind the idea? If I don't hear from throngs of people going "your sarcastic tone takes away from the wonderfully drawl summaries and that is a bad thing", then I will just go ahead and write with personality. Just don't complain later. =)

2.2.2b1 has been occupying Python-dev during this summary period, and so this summary is shorter than usual. I left out a bunch of threads that were discussing bugfixes that I either didn't find interesting or didn't think the rest of the world would care about.

A.M. Kuchling has put all Python-dev summaries up at http://www.python.org/dev/summary/ . So now the archive is centralized instead of spread out among three web sites. I am sure the original archive sites will stay (I will keep mine up), but all future references will go to that page.

Now it's time for my personal favor of the month. I am going to start applying to grad school for computer (science | programming) when I get back from vacation. If anyone knows of Python-friendly schools out there, let me know. Heck, I am even willing to leave America to go to school as long as the classes are in English. So if you know of any, please let me know!

And now on to the summary.

Python 2.2.2 beta release on Monday!

splinter thread:

Python 2.2.2.b1 was released on Monday, October 7. This is the reason (or perhaps excuse is a better description) for the lighter summary this week. A good amount of the traffic on Python-dev was about bugfixing 2.2.2.b1. Most of this probably would not interest the average Python user, and thus I didn't summarize a bunch of threads. Taking the GRE also didn't help with my free time and thus has caused me to cut down on the summary since I am having to go through a huge backlog to get this out the door.

Dropping support for Tcl 8.0 and 8.1

Martin v. Loewis asked if it would be okay to drop support for Tcl 8.0 and 8.1 since _tkinter.c has special code in there just for those outdated versions. Guido ok'ed it, so if you are using those still using a version of Tk from way back when, it's time to upgrade.

*very* revised proposal for interfaces

Previously John Williams came up with a proprosal for implementing the stuff from PEP 245 (Python Interface Syntax) and PEP 246 (Object Adaptation). From my understanding of what John has done, it appears he has written an interface system in pure Python. If you want backstory to this long and involved discussion to get an interface system into Python read the Python-dev Summaries for 2002-08-16 to 2002-09-01 and 2002-09-01 to 2002-09-15 .

Gerald Williams, Michael Chermside, and Esteban Castro all commented on the implementation and made various suggestions.

John said that he was not done yet implementing this. But if you were interested in the whole previous discussion on interfaces you could consider looking at what John has done. As for it going into the language, I suspect that will have to wait until John is done and has convinced the PEP writers and Python-dev that his implementation fits the bill. Stay tuned.

If you want some more background info on interfaces, read the previously mentioned summaries. As for object adaptation, read on c.l.py and on Python-dev anything by Alex Martelli on the subject. He has become the main proponent of object adaptation and has written several very extensive essays on the subject.

perplexed by mro

Splinter threads:

Samuele Pedroni said he was "trying to wrap [his] head around the mro computation in 2.2". Apparently there is the algorithm mentioned at http://www.python.org/2.2.1/descrintro.html (dubbed the naive algorithm) and then the one implemented in typeobject.c (called the 2.2 algorithm). Samuele discovered some inconsistencies with the implemented algorithm that he desired some explanation about.

Guido responded, thankful that someone was giving this a look because his "intuition about the equivalence between algorithms turned out to be wrong". Guido stated that he thought that he wrote the algorithm from the book "Putting Metaclasses To Work" correctly sans raising an error when major conflicts occur in the ordering. In a later email Guido explained that the naive algorithm came about by his attempt to simplify the explanation of the 2.2 algorithm. Guido pretty much wrote the algorithm from the aforementioned book. Now the algorithm is not simple, so Guido did his best to simplify the explanation. Unknowningly, though, he came up with a variant on the algorithm in his explanation.

Greg Ewing pointed out that he thought the naive algorithm was nicer since it seemed to work more intuitively and was easier to explain (and remember kids, these are basic tenants in Python programming). Guido ended up stating that "If Samuele agrees that the naive algorithm works better, [Guido will] try to make it so in 2.3". Well, Samuele said that the "2.2 mro is the worst of our options".

There was a problem, though, with the naive algorithm; it is not monotonic as pointed out by Samuele. This led him to put out two options:

  1. Use the naive algorithm, which had the drawback of not being monotonic. Samuele also believed that it didn't produce "the most natural results".
  2. Adopt C3 as described at http://www.webcom.com/haahr/dylan/linearization-oopsla96.html and apparently used by Goo . This algorithm is monotonic and Samuele says is more intuitive in its results.

Guido got around to reading the C3 paper and agreed that "we should adopt C3". He thought that the 2.2 algorithm was like the L*[LOOPS] algorithm mentioned in the paper, but he is not positive. Samuele then wrote a C implementation of the algorithm. Guido said he would get to the patch after 2.2.2b1 got out the door.

Psyco requests and patches

Armin Rigo of psyco an email to Python-dev with some thoughts on psyco and some requests. After mentioning how he wished more people realized psyco is meant to be used by anyone and not speed-hungry coders (it is a cool app and if you have any interest in compilers you should take a look) and that psyco could get some more advertising he mentioned three patches that he wrote (patches 617309 , 617311 , and 617312 ) against 2.2.2 that he would like to see be accepted so as to ease maintenance of psyco.

Armin also mentioned how he would like to move psyco forward. He pointed out he would like to eventually write it all in Python. This would require tracking changes in the interpreter that psyco dealt with. He would like to keep this in mind when Python 3 discussions kick up (and don't ask when that will happen; not for a VERY long time).

In regards to Armin's patches, Martin v. Loewis thought they broke binary compatibility (big no-no between micro releases), but Armin claimed it didn't.

After glancing over the patches it seems they have all been applied against 2.2.2 and are being actively worked on by Armin for application to 2.3.

PEP239 (Rational Numbers) Reference Implementation and new issues

splinter threads:

Christopher Craig uploaded patch 617779 that implemented PEP 239 (Rational Numbers). He had some questions for Python-dev, though, regarding a couple points. These few points have become the bane of my summarizing existence; this thread is huge.

One was whether division should return rationals instead of floats. Since rationals keep precision in division they are the most accurate way to perform division. THey also make the most output sense (e.g. 1/3). Problem with this is that rational math is slow and this would cause issue with any code that expected a float.

The next issue was about comparison. Should a rational compare only when it is exactly equal to a float or when the float is really close?

Lastly, Christopher wondered if rationals should hash the same as floats. The answer to the second issue would influence the answer to this issue.

Issue 1. Eric Raymond was for returning a rational. Francois was kind of on the fence. Christian was +1 for returning rationals. Guido said ABC did this and that numeric processes thus ended up being slow.

Issue 2. Christian said "Let it grow! Let the user feel what precision he's carrying around, and how much they throw away when they reduce down to a float."

Issue 3. Eric Raymond suggested a global "fuzz" variable that defines a "close-enough-for-equality range"; this idea was used by APL. Andrew Koenig was against this because you don't always want a fuzzy comparison and it destroys substitutability: "If a==b, it is not always true that f(a)==f(b)". Andrew said he preferred Scheme's numeric model. To this, Guido said that "'It works in Scheme' doesn't give me a warm fuzzy feeling that it's been tried in real life"; Tim later laid the smack down on Scheme's numeric model and ended it with "There's a reason the NumPy folks never bug you for Scheme features <wink>". Christian pointed out that keeping it as a rat would prevent overflows from ever occuring from long division. Tim was staunchly against a fuzz variable. Raymond suggested a fuzz comparison function that took in a fuzz value. Christopher said that the way it stands now in the implementation is that rationals are coerced to floats and then compared. Oren Tirosh suggested a thired boolean, 'Undetermined', that would be raised when the "difference between A and B is below the error margin". David Abrahams said that Boost discussed this and said that the cost of adding ternary logic was not worth it.

Andrew asked how rats could be optimized. He suggested ditching trailing zeros. Tim wondered how much of a save you would get from this. Raymond Hettinger suggested having a builtin variable that would specify the "maximum denominator magnitude". Christian liked this idea. Greg didn't think this would be a good solution because people using rats are going to want them specifically because they are exact. So Raymond suggest the default be unrestricted denominator.

Guido brought up the question of how rats should be represented when printed.

The syntax for rats came up. Greg Ewing got the ball rolling by suggesting the syntax for rat division as \\\. M.A. Lemburg suggested just having a constructor like rat(2,3). The discussion then had a gamut of suggested syntax: 2:3, 2r3, {2/3} (Guido shot this down because he wants to leave the option open for possible set notation), <2/3>,``2r/3``, something by Barry using an extended character that Pine wouldn't display (it was a joke), and finally 2/3r by Guido. People agreed that this last one suggested by Guido was the best one. Tim also pointed out that Scheme has notation to specify whether a number is exact or not and using the 'r' notation would basically provide the same functionality.

But regardless of what syntax people preferred, it was overwhelmingly agreed that choosing the syntax should wait until rationals have been in the language for a while and it is known how they are used.

If you only read one thing, read Tim's emails since he explains all of this really well and is the resident math whiz on Python-dev.

Non-ASCII characters in test_pep277.py in 2.3

Guido pointed out that test_pep277.py uses an encoding cookie which was not being recognized by his toolchain. At some point he stated that he is "still not 100% comfortable with using arbitrary coding cookies in the Python distribution".

The reason I mention this thread (beyond for the quote above) is that info on how to get XEmacs to recognize the cookie came out. Sjoerd Mullender sent out the link http://www.xemacs.org/Documentation/packages/html/mule-ucs_2.html which helped some people. As for Linux distro-specific problems, M.A. Lemburg noticed that SuSE puts Mule support in a package named 'mule-ucs-xemacs' and once he got the package loaded XEmacs worked.

Unclear on the way forward with unsigned integers

Splinter threads:

Mark Hammond was "a little confused by the new world order for working with integers in extension modules". Mark wanted know how to create objects that were more like a collection of bits than an integer. Tim suggested creating a Python long; that would act more like an unsigned int in terms of its bits.

The FutureWarning for hexadecimal constants was brought up and it was pointed out that to deal with those just stick an 'L' at the end. Remember folks, that in Python 2.3 0x80000000L == 2147483648.

The usefulness of __future__ statements also came up. Tim wondered how useful they were. Thomas Wouters, though, came to __future__'s defense and explained how it helped him migrate people to newer versions of Python without being yelled at for breaking their code.

segmentation fault with python2.3a0 from cvs

Gersen Kurz was having problems importing a huge file. The bug was attributed to cygwin's malloc implementation, so people might want to watch out for that.

It was also pointed out that a loop with a bunch of items in a dict created a huge number of references. It turns out that dicts use dummy references in its implementation for when something is deleted. So don't be alarmed by huge references even after you deleting an immense dict.

Snapshot win32all builds of interest?

As mentioned in the last summary, Mark Hammond wondered if anyone would care to have access to compiled snapshots of CVS for Windows. He got enough of a response to give access at http://starship.python.net/crew/mhammond . This is not like the standard Windows installer, though; "This version installs no shortcuts, does not compile .pyc files etc - you are pretty much on your own. Pythonwinstart_pythonwin.pyw is installed to start Pythonwin, but you must do so manually". Mark would like to know if you end up using this.

Set-next-statement in Python debuggers

Richie Hindle wants to write a pure Python debugger; the problem is that it would be difficult without certain C-level stuff exposed to Python. Specifically, he wants frame.f_lasti (to be found in frameobject.c ) to be writable by Python so he "could implement Set-Next-Statement".

Michael Hudson is the first email I have of someone chiming against this. He thought it would be better to make it a descriptor so as that "you can do some sanity checking on the values". Guido then chimed in saying that if it was writable that would open up a hole for crashing a program. Guido eventually said making it read-only would be fine.

This led to Armin Rigo pointing out that you can crash the interpreter already with the new module "or by writing crappy .pyc files". Guido acknowledged this, but said he didn't want to add anymore if it could be helped. He pointed out that he wants to stick with the idea that a segfault is Python's fault unless proven otherwise.

Multibyte repr()

A patch was applied that allowed repr() to return characters with the high bit set; repr() used the "multibyte C library for printing string if available". This had caused a bug and made Guido wonder if this was a good thing to do. For an example:

>>> u = u'\u1f40'  # Python 2.2
>>> s = u.encode('utf8')
>>> s
'\xe1\xbd\x80'
>>>

>>> u = u'\u1f40'
>>> s = u.encode('utf8')
>>> s
'1Ú2\x80'  # Notice the extended character
>>>

"The latter output is not helpful, because the encoding of s is not the locale's encoding".

Martin v. Loewis said that he thought author of the patch's intention was "to get 'proper' output in interactive mode for strings". Part of the issue with all of this is the GNU readline calls setlocale() automatically. A patch came about to reset it in the extension module back to its original state.

The issue of whether pickling would break because of this. Guido tried it and had no issue. Atsuo Ishimoto (who brought up the possible problem) said it broke when using the ShiftJIS locale.

But the worry of having repr() be locale-specific still lingered. Martin said he was "convinced that having repr locale-specific is unacceptable". He said, though, that having the tp_print slot use a locale-aware print function was fine and to have it differ from tp_repr was fine. But this was shot down by Guido; "tp_print only gets invoked when sys.stdout is a real file; otherwise str() or repr() get invoked". Apparently tp_print is a performance optimization and thus should be fully transparent and not be different in any way to the user.

Guido said that the multibyte-string patch should be backed out. With the pickle issue and different semantics for sys.stdout because of tp_print, Guido said the patch had to be backed out.

As Tamito Kajiyama said, "one of the virtues of Python is that Python has no language feature that is (automagically) affected by locale settings".

tp_dictoffset calculation in 2.2.2

Guido asked David Abrahams and Kevin Jacobs if a change in how tp_dictoffset (found in typeobject.c ) was calculated would affect them. To give some background, tp_dictoffset "tells us where the instance dict pointer is in the instance object layout". This was a three step process (all Guido's words):

  1. (Line 1113) For dynamically created types (i.e. created by class statements), if the new type has no __slots__, the dominant base class has a zero tp_dictoffset, and the dominant base doesn't have a custom tp_getattro, a __dict__ will be added to the instance layout, and its tp_dictoffset is calculated by taking the end of the base class instance struct (or in a more complicated way if tp_itemsize is nonzero).
  2. (Line 1941) If the dominant base has a nonzero tp_dictoffset, copy the tp_dictoffset from the dominant base.
  3. (Line 2090) The tp_dictoffset of the first non-dominant base class that has a nonzero tp_dictoffset is copied.

That last rule had caused Guido and Jeremey Hylton some problems with some code they were bugfixing. Guido wanted to just get rid of that rule since he though "it is always wrong". Both David and Kevin said that nothing broke for them, so that is now all straightened out.

Memory size overflows

Armin Rigo pointed out some potential overflow errors "with objects of very large sizes". The issue was when the amount of memory needed to allocate was calculated there was the chance that it would overflow. Armin suggested adding macros to deal with various issues.

Arrive Tim Peters, creator of pymalloc. He admitted that he "always ignore[s] these [errors] until one pops up in real life" since "Checking slows the code, and that causes [Tim] pain <0.5 wink>". He pointed out another possible overflow calculation problem. But it was a basically a hopeless battle since malloc() has its own cross-platform issues.

But Tim did say that if it was decided to go down the macro route, then he wanted something like what Zope does: DO_SOMETHING_OR(RESULT_LVALUE, INPUT1, ..., ON_ERROR_BLOCK);. The result goes into RESULT_LVALUE unless there is a problem, in which case ON_ERROR_BLOCK is run.

Christian Tismer chimed in and said that he thought we should just move completely over to 64 bit math. Ruby had done it successfully so it wasn't like we were taking a blind leap. It would also save us the hassle from doing it down the road when 64 bit processors become the norm instead of the exception.