Part of the report on the NIST Python
workshop software management session
ken.manheimer@nist.gov, 301 975-3539
Original 24-Nov-1994, last mod 5-Dec-1994.
Module Package-Suites
Python's import statement, together with the modules it produces,
offers a powerful, well-integrated mechanism for organizing software
packages. It enables a degree of code organization that is necessary
to get beyond systems of a relatively small-scale. However, the
current implementation has a few shortcomings, which we propose to
alleviate.
The python module system currently offers no basis for affiliating
suites of modules in a coheresive way. Also, the name space in which
modules are registered - sys.modules - has only a single
layer, so that the name of every module imported within a session must
be unique. The former fact inhibits the degree to which modules can
be organized with respect to one another, while the latter leads to
untenable module-name conflicts, across name spaces.
The proposal i brought to the workshop extends the import scheme to
address both these issues. It does this by using nesting within the
module registration name space (sys.modules) to support
coordinated suites of modules. We will informally refer to these
suites of modules as packages.
In particular, the arrangement enabled loading of a suite of modules
as a unit, as well as loading individual components of the suite,
alone. It also enabled modules within the suite to preferentially
load other modules within the suite.
At the workshop we hashed out the proposal, and recognized some
problems, as well as some avenues for addressing these problems. One
of the primary problems raised was absence of the means to access
modules outside of a suite that happen to have the same name as
modules within the suite.
Out of that discussion came another of my big homework assignments -
to collect and digest what we came up with during the session,
continue to iron out the wrinkles, and produce a complete proposal.
After further work, and some hashing-out with guido, i have the
following proposal to offer.
Module packages proposal
The essence of the proposal entails setting up a clean relationship
between module nesting and directory nesting. It hinges on having
imports of directories (as found on the load-path)
amount to an import of the contents of the directory, into a
module representing the directory as a whole.
The files within the directory are either module files, or themselves
directories which constitute module packages. In either case, they
are loaded as modules within the package. (Thus, given a directory
named package containing module file module, import
of the directory would result in a module named package and a
module within it, package.module.)
When a package directory contains a file with the special name,
__main__, then the package load process is controlled by that
module. Thus, import of a package need not necessarily load all, or
even some, of the contents of the directory, and special processing
can be done to determine what should and shouldn't be loaded. If
there is no file named __main__ in the directory, then the
entire contents of the directory is imported.
Below is a description of each of the components of this scheme,
followed by some open questions for dicussion,
then an "executive summary" and features summary.
Importation of individual package components
Individual components of a package can be explicitly imported, by
specifing the composite module reference for the object. For
instance, the above mentioned package.module can be imported
as a single entity. Even if the package has not already been
imported, only the package.module component will be imported
- but it will be pulled in as a nested object, not as bare
module.
Extension of the module load path using __modpath__
Loadup of the contents of a package directory is done with respect to
a special extension of the load-path mechanism, designed so that
imports from within the package prefer modules that reside
within the package. The extension is based on the introduction of a
new variable, tentatively called __modpath__.
__modpath__ would get it's value from a combination of two
things.
A module's __modpath__ value will be a list of strings,
inherited from the module doing the import. In addition, if the
module is being imported from a package directory (typical when
__modpath__ has a value), then its __modpath__
value would have the path of the directory prepended to it.
Thus, within a package, modules need not specify the package name in
order to explicitly import other members of the package, and they will
inherently "prefer" to load other modules from their package, and then
other modules from packages which contain their package, and so
on...
Referring to similarly named modules outside the package
One pitfall in the original proposal, identified during the workshop
session, was lack of provision for importing modules from outside the
package that happened to have the same names as modules within the
package. This is a useful and necessary capability. We have a
mechanism and syntax which provides an unambiguously distinct and
clear way for doing this - for invoking absolute imports of modules
from outside of a package.
The import command will recognize a special pseudo-module,
__python__, as a root for 'absolute' references to modules,
ignoring any prevailing __modpath__ value. This means that
imports of names that are prepended by __python__. refer
to top level packages along the system path, ignoring files within the
package that happen to have that name.
For example, a package could have within it a module named
sys, which can be imported from within the package as a
wrapper for the standard sys package. The packages'
sys module would do so by using a from __python__.sys
import *. Modules within the package would get the wrapper
module by doing an import sys.
Some Questions for Discussion
Here are some prominent questions for discussion that come immediately
to mind:
- Might there be better choices than '__python__' for the
pseudo-root module path? One possibility i've entertained is a
null name, so a leading '.' would indicate a "rooted" path (ie,
looking for a module name with respect only to the sys.path
value). This approach seems a bit radical, maybe a bit contrary
to typical python constructs...
- I've had a suggestion concerning the way that packages in a
module share their extension of the load path. It was suggested
that it would be suitable to have a __pkgpath__
variable within each component module that works the way i've
proposed for __modpath__, except that the structure of
the path list would be shared, so structural changes within one
of the modules would be reflected within the rest. I don't see
the reason for this, but could certainly implement both the
__modpath__ and __pkgpath__, and make the
default behavior use whichever seems more suitable. Thoughts,
anyone?
- When a component of a package is loaded, eg apkg.spam,
and the rest of the module is not yet loaded, what should
reload(apkg) do? Should it just do the entire package
load, or should it fail, on the basis that the primary module for
the package actually hasn't been imported. I'm inclined towards
the latter, but would certainly entertain convincing rationale to
the former...
Summary of the proposal for Module Packages
To summarize, given an example package directory apkg which
resides on sys-path, and containing a module named
amod and another:
- import apkg would produce a module object named 'apkg'
in the current name space, and
- apkg would contain two modules: 'amod' and 'another',
initialized with the contents of apkg/amod and apkg/another,
respectively
- The 'apkg', 'amod', and 'another' modules would all contain the
variable __modpath__, set to a list with one string,
the path of the 'apkg' directory
- sys.modules would contain entries for 'apkg', as
apkg, 'amod', as apkg.amod, and 'another', as
apkg.another. (The nested representation unequivocally
distinguishes the package modules from other, outside modules
with the same file name.)
- If the package directory contains a module with the special name
__main__, then that module will override the default
behavior, taking responsibility for loading (or not loading)
the module components.
- 'amod' could import 'another' using import another, and
vice-versa.
- Modules outside the package could import, eg, 'amod', using
import apkg.amod.
- 'amod' would be initialized if it had not already been
imported, but the rest of 'apkg' would not be.
- The importing module would get a module named 'apkg',
containing the module named 'amod'. It would contain only
'amod' (and __modpath__) if other components of
'apkg' had not previously been imported, and the entire
contents of 'apkg' otherwise.
- Modules within a package can circumvent the __modpath__
import preference for other components of the package by
prepending the names in the import statement with
__python__.. Eg, import __python__.rfc822
will obtain the standard system rfc822 module, even if the
current package has a module of its own with the same name.
(This will be particularly useful for creating module wrappers.)
Features of the proposed changes
- Fully backwards compatable
- Eliminates module name conflicts via suitable name space structuring, ie
- Fully recursive nesting of modules as packages, which
- Enables loading of module packages as a unit, with finer control
available when needed
- Enables package members to load other members with preference
- Enables modules to load other modules within or outside their
packages, regardless of name collisions
To Implement Packages
I hope sometime soon to produce a prototype implementation of the
above behavior, using the import patches that guido recently
developed. The features that should be implementable without changes
to the python language syntax are __modpath__ and loading of
directories as package modules.
Once an acceptable prototype has been developed, guido should make the
parser changes necessary to accomodate extension of the import syntax
- composite module names - and the prototype should be completed and
evaluated for incorporation into the language. (It would probably be
suitable to reorganize the standard modules at that point, beginning
to structure their relationships using nesting, and clean up the top
level modules namespace.)
Hopefully, this and the above mentioned changes will be ironed out and
fully implemented for incorporation as part of python 1.2, or earlier
as 1.1.2(?).
ken