Guido's Work In Progress

Disclaimer: this page is out of date.

This Web page contains a loose selection from my current projects on Python. None of this is available yet -- but at least part of it will eventually find its way to a Python distribution.

This page will probably remain unchanged after the May 1995 workshop.

Python and the Web

The release of Python 1.2 has brought Python in sync with the World-Wide-Web (WWW). See the section on Internet and WWW of the Library Manual .

However, there is always more. Here's what I may be working on:

HTTP server: There is a base class on top of which you can implement anything conforming to the HTTP protocol. Each request is handed to a new instance of a class specifically designed to handle one request. There will be a sample class implementing a fairly basic HTTP server serving mostly files, but also supporting CGI scripts. Configuring the server is done deriving classes that override specific methods (e.g. permission checking).
Remote import: By adding two lines to your main program it will be possible to import modules by URL into it. You can place URLs in sys.path that either point to specific files or to directories. A cache is used to avoid excessive network traffic; there are controls to force cache consistency checks.

Distributed Computing

Aside from the web, I have a few pet projects that use private protocols on top of TCP/IP:

Remote Procedure Call Framework: (Not to be confused with Sun RPC, for which I created some demos years ago -- see Demo/rpc in the Python distribution.) I have some classes that make it trivial to turn an arbitrary class into a server (using multiple inheritance). The transport protocol is TCP/IP but this can easily be changed to another reliable stream protocol. It uses the pickle module for transport of the arguments and return values (and exceptions). To be done: a more general framework whereby the server can choose to handle individual clients (or requests) as forked processes, as threads, one at a time, or multiplexed using select(). Note that the client class is completely generic -- it queries the server at run time for the availability of requested methods. Note that this isn't trying to compete with ILU -- the latter promises interaction between servers written in different languages and CORBA compatibility, while my framework is Python specific (and requires no C code beyond the Python core). At the moment my framework has no security features.
Remote file system access and directory tree synchronization: Using the RPC Framework mentioned above, I have written a class that provides abstracted access to a file system, and a client application that uses a local and a remote version of this class to synchronize a local directory tree with a remote one (a la rdist, except it can push as well as pull).
Remote RCS and CVS: In a similar vein, I have written a class that provides an abstract interface to RCS, and two client applications: one provides RCS style operations, one provides CVS style operations. Especially the latter is still in a very embryonic form and doesn't handle locking, branches, tags, subdirectories, nor does it send notification mail -- yet it has proven useful for the maintenance of the Python sources in two locations.
CNRI handle server interface: CNRI (where I am working as a guest researcher, and which allows me to do other work besides Python :-) has a project called the handle server which is a URN system (the handle server translates location-independent names for documents into URLs and other data about the object). I'm writing a Python library which interfaces to the CNRI handle server so that Python's urllib will be able to resolve handles as well as normal URLs.

Changes to the Python core

Here are a bunch of things that are very likely to happen. Note that here I seem to be talking mostly about changes suggested or contributed by others. In my own view, the Python core is perfect :-)

The Great Renaming: I plan to go forward with the Great Renaming (object -> PyObject, etc.). This will be released as 1.3 or 1.4. I have a script that can do most of the work automatically, based on a data file which is trivially created from rename2.h. I want to review each converted file, however, so it's a lot of work. If there are volunteers who want to do more than run the script (e.g. add __doc__ strings, fix comments, fold lines that have become too long, rename local identifiers, remove redundant #include lines etc.), feel free to get in touch with me!
New generic object API: This is a proposal that was mostly written by Jim Fulton and has been amply discussed on the newsgroup. I plan to eventually implement it, and release it together with the Great Renaming.
Optimizations: Jim Roskind donated a patch (developed in close cooperation with me) that greatly streamline the function and method call mechanism. It combines several tactics. Stack space is allocated together with the stack frame record, thus saving (at least one) malloc call. Function arguments are passed from the caller's stack to the callee's stack without building and unpacking a tuple object (another saved malloc call). The 'self' argument is inserted in the argument list without copying (another saved malloc call). The current line number is no longer set by a separate instruction but calculated from a table only when needed for debugging (this saves one Python Virtual Machine (PVM) instruction per executed line of code). There are also some changes to the compiler and to the PVM's instruction set to generate somewhat more compact code (on the order of 5%). Unfortunately I haven't discovered a benchmark yet that shows off the improvements. Because this is a very complicated set of changes I am reluctant to introduce them without (a) thorough testing and (b) good proof of its effectiveness.
C trace for exceptions: Donald Beaudry contributed a patch that adds the current file name and line number as hidden argument to all error setting functions. A run-time flag enables printing of this information when a stack trace is printed. Extension writers will love this feature.
Packages: Ken Manheimer donated a prototype implementation which makes it possible to structure groups of related modules in "packages". If P is a package and M1 and M2 are modules in P, defining functions f1() and f2() respectively, then the package should be a subdirectory called P under one of the directories on sys.path (e.g. /usr/local/lib/python/P/) and the modules should be files M1.py and M2.py inside P. The user can now write "import P" and then use "P.M1.f1()", "P.M2.f2()", etc. It is also possible to write "import P.M1" afer which "P.M1.f1()" is a valid reference but "P.M2.f2()" is not. You can also write "from P import M1" or "from P.M1 import f1". (Note that Java got this wrong -- their "import A.B" creates a local name B, so it is hard to use two unrelated packages that happen to define a module with the same name, like "P.main" and "Q.main".) I plan to gradually introduce this as a standard feature of the core. At some point it will be necessary to split the existing library into a number of groups, which will be painful. (Ken's prototype is already present as "newimp.py" in the 1.2 distribution but contains one bug. A fixed version is available from any Python mirror site in pub/python/src.)

Futurism in the Python core

In contrast to the previous section, none of the following is anywhere near implementation. However I think they are good ideas that deserve to be implemented.

Getting the current exception: Currently, an except clause that needs access to the current exception must use the global variables "sys.exc_type" and "sys.exc_value". This causes problems when the except clause invokes some other function that happens to raise and catch an exception -- this overwrites the "sys.exc_*" variables. It also is a nightmare in threaded programs. I am thinking about a mechanism to reliably access the exception currently being handled. It will probably involve a built-in function that peeks in the interpreter stack.
Third argument to "raise": The "raise" statement will get a third argument, representing the stack trace object to be used. This will make it possible to "lie" about where an exception actually happened. This is useful for debuggers and other service modules that need to catch exceptions but want to pass them on transparently.
Explicit interpreter state: A bunch of global variables in the interpreter will be combined into a structure. This will make it possible to have multiple, completely independent interpreters in a single process. While this isn't needed as often in Python as it is in TCL (where it's about the only way to have independent global name spaces), this makes it possible to create new, "controlled" environments, e.g. to restricted privileges or to implement an interactive interpreter inside a window. Variables in module sys that will be affected include path, modules, exc_* (but see a previous item), stdin, stdout, stderr, and probably others. The interface to access them may change!
Move the C stack out of the way: It may be possible to implement Python-to-Python function and method calls without pushing a C stack frame. This has several advantages -- it could be more efficient, it may be possible to save and restore the Python stack to enable migrating programs, and it may be possible to implement multiple threads without OS specific support (the latter is questionable however, since it would require a solution for all blocking system calls).
Restricted execution: This was born under the name "Safe Python" at the previous workshop. There are a number of hooks in Python 1.2, and an undocumented module "rexec.py". I hope to extend this in a number of ways (and to document it) -- for one thing, the state of the restricted environment should be captured in a class instance, so you can create multiple restricted environments. Of course, the explicit interpreter state discussed above will also have some implications here. Hopefully we will have time to discuss this at the workshop!
Keyword parameters: It would be a good idea to have a general way for specifying keyword parameters. The Modula-3 style seems to fit the bill just right, with '=' for the delimiter. Valid keywords (checked at call time of course) will be the argument names. I propose some kind of syntax like "def f(a, b, c=1, d=2, *args, **keywords): ..." for writing functions that take arbitrary keyword parameters -- any keyword parameters not in the argument list will end up in the dictionary 'keywords'.
Assignment operators: I'm not absolutely sure about this one, but I think that eventually I'll support operators like "+=", "*=" etc. (Still only as assignments, not in expressions.) Classes and extensions may override these in a way that works for mutable as well as immutable types -- so "x = 1; x += 1" binds x to the new object 2, while "x = y = [1]; y += [2]" results in the value [1, 2] for both x and y.

Things I'm not so keen on

Here are a number of proposals that come up every once in a while, with reasons why I don't believe in them. I may be convinced by working implementations, though -- just don't expect me to spend much time pursueing them!

Garbage collection: For the time being, even the best conservative garbage collectors have serious problems in the context of Python: (a) they restrict portability (since they require some assembly code for each new platform -- this will be a problem for minority platforms), and (b) they don't guarantee that destructors (class __del__ methods or extension's "dealloc" functions) are called in a predicatable fashion. For example, this will affect code that opens many files and relies on the deallocation to close the file -- if the garbage collector doesn't reclaim the file objects the application may run out of available file descriptors.
Universal thread support: In my experience, programming with threads is a lot harder than programming without them. Using threads as the basis of Python's programming paradigm would (again) restrict portability and make it harder to write Python programs -- e.g. you have to understand what happens to global shared data. It would also harm embeddability in applications that don't have or need threads and limit the usability of extensions around libraries that are not thread-safe. Note that python does have optional thread support -- on those systems that have OS threads such as Solaris, IRIX, Windows NT or any system with a PTHREADS interface. (Okay, the PTHREADS interface may need some work to work with non-vendor-provided implementations.)
A Python compiler: Because of Python's dynamic typing, it is hard to generate efficient assembly (or even C) code from Python statements. E.g. even "x = x+y" could mean integer or floating point addition, string or list concatenation, or an arbitrary object's method call. A straightforward compiler could at best generate calls into Python's run-time system for each expression. It's not clear that this would run much faster than the current bytecode interpreter. A really clever compiler could do a lot better by inferring types from a global analysis of the program -- but such an analysis would require a lot of work. (This has been done for the "self" language -- their best compiler spent half an hour analyzing an expression like "max(a, b)"...) Note that often a request for a compiler is really a request to speed up a particular program. When a small amount of code is really much too slow, the best option is often to rewrite just that little bit in C. One possible idea that may help reduce the start-up time of small programs: perhaps the "freeze" tool could be extended to do some global analysis and remove unused functions/classes/modules to reduce the amount of code needed for initialization; or perhaps it could generate a statically initialized data structure that is closer to the run-time representation of code objects than their marshalled representation (since the unmarshalling can be rather costly, even if it is much faster than parsing).
Translating into Scheme: This would probably leave all extensions an embeddings of Python in the dark. These are much of the reason for Python's success in the first place!
The access statement: In retrospect I think this was a mistake -- that's why I have refused to document it. Its implementation requires placing traps at every place that may access a variable. A better solution would be to have a derived type of dictionary which can set arbitrary traps on specific keys. This may be augmented by an "export" statement which explicitly declares (at the module or class level) which names are externally visible.

Some interesting hyperlinks

Some of these contradict things I say above...