Home Search Download Documentation
   Help Community SIGs   
Import SIG
Introduction
Subscribe
Archive
 
Email Us
gmcm@hypernet.com
 

Python Import-SIG

Introduction

The Import SIG exists to provide a forum to discuss the next generation of Python's import facilities.

The long-term goal of the SIG is to eform the entire import architecture of Python. This affects Python start-up, the semantics of sys.path, and the C API to importing.

A short-term goal is to provide a "new architecture import hooks" module for the standard library. This would provide developers with a way of learning the new architecture.

Background

The SIG was born as the result of a discussion on developers day at IPC8. The topic itself is much older, of course.

Pre-History

In the early days of the 21st century, archeologists discovered that originally, Python had no packages (not even jars to keep pickles in). Modules were left on the path where anyone could trip over them. When a particular module was needed, a page was sent out to "find_module". When he returned, saying he had found it, he was then clobbered over the head and sent back out to get it.

Meanwhile, modules, without any sense of propriety, were doing their thing on the path, and in no time at all, Pythondom was littered with the disgusting little things.

Then, one dark and stormy Knight who said "ni" got tired of tripping over them, falling into the ditch and getting his armor rusty. He bravely started piling modules on top of other modules. Protocol was not adjusted however, so the pages now had to make 4 trips; first to find and get the top module, then to find and get the module underneath.

Other Knights, tired of pages returning with the wrong module, began using specially trained pages (called "hooks") who had some tricks for finding exactly the right module. Unfortunately, hooks were a bloodthirsty lot, and if two of them met on the path, usually only one survived.

Late 20th Century

By Python 1.5, both approaches had been blessed. Python had packages built into the language, and a "preferred" method for doing import hooks (ihooks.py).

Unfortunately, the architecture has grown rather complex. Hooks take over at the level of the builtin __import__ (which is what the keyword import calls, as well as the C level PyImport_ImportModule). This is before the package mechanics are encountered. So any hook that deals with packages needs to emulate the package machinery (and ihooks.py provides an implementation of this). See the call graph diagram for an overview.

Using ihooks requires an intimate knowledge of the import mechanism. You change or add functionality by overriding the way ihook's pure Python implementation of the import process sees the "filesystem", or performs the low-level import tasks, (you can, of course, override at a higher level, but you'll have to implement more of the basic mechanisms). See the class diagram of ihooks.

The Problem

The import mechanism is coming under pressure from a number of sources. Packages have moved from being a novelty to a necessity. Package authors are creating complex multi-level structures with inter-dependencies between sub-packages or packages.

Others are doing imports from things other than the filesystem, (archives, databases, possibly even URLs).

People do strange import hacks to get around versioning problems, or platform dependencies. Most of these do not use ihooks, probably because it takes considerable effort to learn how to use ihooks effectively. Many end up with a wrapper module that finds the right code and stuffs it directly into the required namespaces, bypassing the import mechanism altogether.

This creates a problem for freeze and installers in that tracking dependencies is nearly impossible.

There are other problems. It takes a whole lot of system calls to do a (normal) import, so Python performance suffers, particularly in a CGI-like enviroment. The "approved" ways of extending the path and installing packages and modules are rarely followed, (it's been a moving target), making installations brittle.

And then there are some related issues: such as network installs of Python; or Python in the presence of both network and local installations.

The Proposal

In early 1999, Greg Stein wrote imputil, which turns the problem on its head. It introduced the idea of having multiple importers. An import request would be handed to each importer in turn, until one of them satisfied the request. In addition, the API for importers makes it easier for the developer to deal with the package machinery.

This solves a number of problems. It makes it easy to import from alternate sources (you don't have to pretend you're a filesystem). It lets one package author install one set of hooks without interfering with anyone else's hooks (or lack thereof). The importer can be distributed with the package, making distribution and maintenance simpler. Combined with an archive of compiled Python modules, it makes awesome start up performance possible. A class diagram of imputil is here. Imputil itself can be downloaded from Greg's web site.

It does make writing certain kinds of import hooks more difficult. "Policy" hooks that affect an entire installation are not easy, (whether this is good or bad is a valid discussion topic). Hooks that take advantage of the current import's assumption that everything is in the filesystem may end up more verbose, (eg, a hook that overrides the "find" part of today's import mechanism, but leaves the "load" part alone).

In addition, there are areas that need improvement. There is currently almost no capability to manage the collection of importers. Performance on a normal Python installation is disappointing, (the only time imputil passes control back to the normal mechanism is for loading binary extensions).