Part of the report on the NIST Python workshop software management session

ken.manheimer@nist.gov, 301 975-3539

Original 24-Nov-1994, last mod 5-Dec-1994.

Module Package-Suites

Python's import statement, together with the modules it produces, offers a powerful, well-integrated mechanism for organizing software packages. It enables a degree of code organization that is necessary to get beyond systems of a relatively small-scale. However, the current implementation has a few shortcomings, which we propose to alleviate.

The python module system currently offers no basis for affiliating suites of modules in a coheresive way. Also, the name space in which modules are registered - sys.modules - has only a single layer, so that the name of every module imported within a session must be unique. The former fact inhibits the degree to which modules can be organized with respect to one another, while the latter leads to untenable module-name conflicts, across name spaces.

The proposal i brought to the workshop extends the import scheme to address both these issues. It does this by using nesting within the module registration name space (sys.modules) to support coordinated suites of modules. We will informally refer to these suites of modules as packages.

In particular, the arrangement enabled loading of a suite of modules as a unit, as well as loading individual components of the suite, alone. It also enabled modules within the suite to preferentially load other modules within the suite.

At the workshop we hashed out the proposal, and recognized some problems, as well as some avenues for addressing these problems. One of the primary problems raised was absence of the means to access modules outside of a suite that happen to have the same name as modules within the suite.

Out of that discussion came another of my big homework assignments - to collect and digest what we came up with during the session, continue to iron out the wrinkles, and produce a complete proposal. After further work, and some hashing-out with guido, i have the following proposal to offer.

Module packages proposal

The essence of the proposal entails setting up a clean relationship between module nesting and directory nesting. It hinges on having imports of directories (as found on the load-path) amount to an import of the contents of the directory, into a module representing the directory as a whole.

The files within the directory are either module files, or themselves directories which constitute module packages. In either case, they are loaded as modules within the package. (Thus, given a directory named package containing module file module, import of the directory would result in a module named package and a module within it, package.module.)

When a package directory contains a file with the special name, __main__, then the package load process is controlled by that module. Thus, import of a package need not necessarily load all, or even some, of the contents of the directory, and special processing can be done to determine what should and shouldn't be loaded. If there is no file named __main__ in the directory, then the entire contents of the directory is imported.

Below is a description of each of the components of this scheme, followed by some open questions for dicussion, then an "executive summary" and features summary.

Importation of individual package components

Individual components of a package can be explicitly imported, by specifing the composite module reference for the object. For instance, the above mentioned package.module can be imported as a single entity. Even if the package has not already been imported, only the package.module component will be imported - but it will be pulled in as a nested object, not as bare module.

Extension of the module load path using __modpath__

Loadup of the contents of a package directory is done with respect to a special extension of the load-path mechanism, designed so that imports from within the package prefer modules that reside within the package. The extension is based on the introduction of a new variable, tentatively called __modpath__. __modpath__ would get it's value from a combination of two things.

A module's __modpath__ value will be a list of strings, inherited from the module doing the import. In addition, if the module is being imported from a package directory (typical when __modpath__ has a value), then its __modpath__ value would have the path of the directory prepended to it.

Thus, within a package, modules need not specify the package name in order to explicitly import other members of the package, and they will inherently "prefer" to load other modules from their package, and then other modules from packages which contain their package, and so on...

Referring to similarly named modules outside the package

One pitfall in the original proposal, identified during the workshop session, was lack of provision for importing modules from outside the package that happened to have the same names as modules within the package. This is a useful and necessary capability. We have a mechanism and syntax which provides an unambiguously distinct and clear way for doing this - for invoking absolute imports of modules from outside of a package.

The import command will recognize a special pseudo-module, __python__, as a root for 'absolute' references to modules, ignoring any prevailing __modpath__ value. This means that imports of names that are prepended by __python__. refer to top level packages along the system path, ignoring files within the package that happen to have that name.

For example, a package could have within it a module named sys, which can be imported from within the package as a wrapper for the standard sys package. The packages' sys module would do so by using a from __python__.sys import *. Modules within the package would get the wrapper module by doing an import sys.

Some Questions for Discussion

Here are some prominent questions for discussion that come immediately to mind:
  1. Might there be better choices than '__python__' for the pseudo-root module path? One possibility i've entertained is a null name, so a leading '.' would indicate a "rooted" path (ie, looking for a module name with respect only to the sys.path value). This approach seems a bit radical, maybe a bit contrary to typical python constructs...
  2. I've had a suggestion concerning the way that packages in a module share their extension of the load path. It was suggested that it would be suitable to have a __pkgpath__ variable within each component module that works the way i've proposed for __modpath__, except that the structure of the path list would be shared, so structural changes within one of the modules would be reflected within the rest. I don't see the reason for this, but could certainly implement both the __modpath__ and __pkgpath__, and make the default behavior use whichever seems more suitable. Thoughts, anyone?
  3. When a component of a package is loaded, eg apkg.spam, and the rest of the module is not yet loaded, what should reload(apkg) do? Should it just do the entire package load, or should it fail, on the basis that the primary module for the package actually hasn't been imported. I'm inclined towards the latter, but would certainly entertain convincing rationale to the former...

Summary of the proposal for Module Packages

To summarize, given an example package directory apkg which resides on sys-path, and containing a module named amod and another:
  1. import apkg would produce a module object named 'apkg' in the current name space, and
  2. 'amod' could import 'another' using import another, and vice-versa.
  3. Modules outside the package could import, eg, 'amod', using import apkg.amod.
  4. Modules within a package can circumvent the __modpath__ import preference for other components of the package by prepending the names in the import statement with __python__.. Eg, import __python__.rfc822 will obtain the standard system rfc822 module, even if the current package has a module of its own with the same name. (This will be particularly useful for creating module wrappers.)

Features of the proposed changes

To Implement Packages

I hope sometime soon to produce a prototype implementation of the above behavior, using the import patches that guido recently developed. The features that should be implementable without changes to the python language syntax are __modpath__ and loading of directories as package modules.

Once an acceptable prototype has been developed, guido should make the parser changes necessary to accomodate extension of the import syntax - composite module names - and the prototype should be completed and evaluated for incorporation into the language. (It would probably be suitable to reorganize the standard modules at that point, beginning to structure their relationships using nesting, and clean up the top level modules namespace.)

Hopefully, this and the above mentioned changes will be ironed out and fully implemented for incorporation as part of python 1.2, or earlier as 1.1.2(?).

ken