Python Programming Environment - Session Overview

Ken Manheimer, ken.manheimer@nist.gov, 19-May-1995

Session Purpose

As a Python programmer, i need tools that will help me capitalize on python's versatility, and on the growing, increasingly diverse contingent of software libraries. As Python grows, i wish to help promote the development of such tools. That's what i see as the focus of this session.

(Two personal traits particularly shape my own programming environment concerns:

I'm sorta lazy - like most people, i want to work effectively, and avoid unnecessary work.

My memory for names, locations, and such, is pretty feeble, particularly when compared with the reliability of the computer's "memory". So i would like, as much as possible, to depend on the computer to keep track of the location of things...

So my interests tend towards tools and conventions that help organize and track new and existing software components and tools, to help me find and reuse what's already there.)

Some Current Programming Environment Issues

The following list is not meant to be a comprehensive survey of the important issues in software management tools. Rather, it includes some items that i see as being of current relevance. Suggestions of other items to discuss are welcome.

(The list specifically includes a few issues that i promised to pursue during the last workshop. Other items are motivated partly by my own continuing software mgmt tools concerns and interests, and partly by suggestions and questions that i have seen go by on the list.)

Object identification - docstrings, attribute identification
Module organization - import module packages/nesting
Runtime environment - debuggers, profilers, env browsers
Documentation formats and efforts
Code versioning
Programming style practices

Object identification - docstrings, attribute identification

Proposals that we discussed at the last workshop have made their way, with refinements, into the language. Also, a new implementation of the 'dir()' builtin has been added to the language as an alternative, within the module 'newdir.py'. Ultimately, these are the sorts of things that are necessary in the development of a comprehensive object examination system, as part of an environment browser.

Docstrings

Docstrings are brief descriptive strings that can be associated with many of the complex python objects, by assigning to those objects '__doc__' attribute. In addition, the definition syntax for callables (functions, methods, and classes) and modules include optional, formal slots for docstring declarations. In particular, when the first executable statement of a callable or a module is a string literal, that string becomes the value of the callable's __doc__ property.

We have only started to formulate conventions about the format of docstrings. As with GNU emacs docstrings, we suggest that the first line of the string be a self-contained, very terse description of the purpose of the object, and subsequent lines describe the public properties and behaviors of the object.

For an extensive example of the use of docstrings, see the newimp.py module which is in the distribution library for Python 1.2.

Where to go from here with docstrings? Several items suggest themselves:

Use them!
Organize effort to create docstrings for existing important objects, like the sys module and it's components.
Identify format conventions for docstrings that makes them most informative (but, also, does not make them cumbersome!).
Incorporate recognition of docstrings in existing python programming tools, like debuggers, attribute identifiers ('dir()'), etc.
Identify other object aspects needed as the basis for comprehensive programming environment browsers.

Of course, there are more. Suggestions?

Import module packages/nesting

Also as promised at the last workshop, i have designed, proposed, and contributed a working prototype which implements module-nesting "packages". The prototype is included in the Python 1.2 distribution, in the module 'newimp.py'.

Description of the new import enhancements.
An important difference between the new mechanism and Java's surprisingly similar import mechanism.
Some further prospects for import enhancements.

Description of the new import enhancements

Why packages? Packages enable module nesting and sibling module imports. 'Til now, the python module namespace was flat, which means every module had to have a unique name, in order to not conflict with names of other modules on the load path. Furthermore, suites of modules could not be structurally affiliated with one another.

With packages, a suite of, eg, email-oriented modules can include a module named 'mailbox', without conflicting with the, eg, 'mailbox' module of a shared-memory suite - 'email.mailbox' vs 'shmem.mailbox'. Packages also enable modules within a suite to load other modules within their package without having the package name hard-coded. Similarly, package suites of modules can be loaded as a unit, by loading the package that contains them.

Usage: once installed (newimp.install(); newimp.revert() to revert to the prior __import__ routine), 'import ...' and 'from ... import ...' can be used to:

import modules from the search path, as before.
import modules from within other directory "packages" on the search path using a '.' dot-delimited nesting syntax. The nesting is fully recursive.
import siblings from modules within a package, using '__.' as a shorthand prefix to refer to the parent package. This enables referential transparency - package modules need not know their package name.
The '__' package references are actually names assigned within modules, to refer to their containing package. This means that variable references can be made to imported modules, or to variables defined via 'import ... from', also using the '__.var' shorthand notation. This establishes a proper equivalence between the import reference '__.sibling' and the var reference '__.sibling'.
import an entire package as a unit, by importing the package directory. If there is a module named '__main__.py' in the package, it controls the load. Otherwise, all the modules in the dir, including packages, are inherently loaded into the package module's namespace.
perform any combination of the above - have a package that contains packages, etc.

Difference between new import mechanism and Java's surprisingly similar import mechanism

As Guido mentions in his Works In Progress page, Sun's new web language, Java, provides a very similar module-nesting capability. However, there is a subtle difference that turns out to be important.

As Guido succinctly puts it:

   Java got this wrong -- their "import A.B" creates a local name B, so
   it is hard to use two unrelated packages that happen to define a
   module with the same name, like "P.main" and "Q.main"

... because both P.main and Q.main would try to populate the current namespace with the name 'main', and hence conflict.

In the new python mechanism, the 'import P.main' would establish module P in the current namespace. You would then reach the 'P's component, 'main', as 'P.main', without colliding with the 'Q's 'main' component.

Furthermore, Python's 'from X import Y' form allows you to deliberately import a component of a module directly into the current namespace, when you specifically wish. I'm not sure whether or not Java has an equivalent of the 'from' form, which might explain why they made the choice they did.

Some further prospects for import enhancements.

I would like to have the import mechanism make it easy to hook in alternate search and load mechanisms, to, eg, enable easy hookup with the web and other media. This would essentially entail making the relationship of the load mechanisms to the module name, or file-type, keyed through a table lookup. Developers of new import types could then hook in their special load mechanisms by adding new entries to the appropriate places in the tables. For instance:

Structure the module search and access mechanism so it is table-driven, according to module name. This would facilitate incorporation of foreign interfaces to recognize, eg, URL's, and do web fetches instead of file-system search.
Similarly, make the file-load mechanism table-driven, to expose the load-mechanisms for the various module types (eg, .py, .so, directory-packages, etc), and enable easy incorporation of new types, or to change the search-order precedences, or whatever.

Restructuring the code to accomplish this sort of thing would take some attention, however, and i would like to find a volunteer with time and motivation to take it on.

Runtime environment - debuggers, profilers, env browsers

In thinking about the prospects for advanced environment browsers for python, i've reached an interesting branch-point. I see two primary and divergent options for embedding the user interface for such tools:

With all the attention on it, we could plan to use a python GUI mechanism, as it emerges. As, eg, Guido embedded the wdb window debugger interface in stdwin.
Alternately, we could work immediately on embedding such interfaces in emacs.

I may be biased, because i am both an intensive emacs user and emacs hacker, but i see some very compelling advantages to using emacs as the runtime-tools interface substrate.

It is already here, and ported to just about all the platforms we could wish. (It may soon be better supported on Apple platforms, since the FSF has dropped the Apple boycott.)
It provides X windows, DOS/Windows, and character-cell screen presentations.
It is available to everyone.
It provides a basis for exquisite integration of these tools with eachother and with the rest of the users operating environment.
As any contemporary Unix-associated GUI must, it "comes with" an extensive emacs-like text editing widget!-)

The primary disadvantage is the same one that always comes with emacs involvement - it's complicated for the user.

It also is sort-of one-way - it would be a good (i think) substrate for embedding our programming-environment tools, but would not be so good as a basis for a python GUI toolkit. (Or, at least, not without some mondo hacking, to make an emacs "API" available from within python!) (Did i hear someone gasp?-) My inclination is to recommend both avenues - emacs immediately, and elegant-and-easy-to-use GUI interfaces as they coelesce. I don't think the different substrates will interfere with eachother, each having somewhat divergent audiences - user-friendly point-and-click with the GUI's, vs programmer-friendly (and user fiendish:-) customize-and-automate-out-the-wazoo of emacs.

With the subject broached, i figure i might as well take the opportunity to mention some of the emacs interfaces which i see as useful, exciting prospects:

Editing - continue to use the wild and wonderful python-mode (thanks to Tim Peters, wherever you are, and to Barry Warsaw, for your recent maintence efforts...)
Debugger - hook up with the emacs debugging interface protocol, GUD ("Grand Unified Debugger"), and get a truly comprehensive python debugger.
Environment Browser (!!) - investigate hooking up with the maturing emacs Object-Oriented Browser utility. (See Browser Features, or ftp the recent sources.)
Code and document indexing - emacs TAGS facilities

I have to mention one other, non-emacs-affiliated option, re code and document indexing: the Glimpse indexing and query system. It looks very promising and useful in it's own right, and is easy to use. And it's even more intriguing when you consider the potentials inherent in it's connection with the Harvest network information discovery and access system.

Documentation formats and efforts

1. Do we need, and can we muster a documentation project, a la Linux?

And,

2. What format should we primarily support?

I haven't thought about this very much, but have increasingly hit up against the second issue. I figure the issue is quite ripe, and just yearning to be resolved. Some offhand thoughts:

HTML means pretty-looking format on a web-based viewing interface.
texinfo means more immediate ability to search across nodes (important, to me!!), but not inherently web distributed.
Many others to consider? SGML, linux doc, etc
We probably need to focus on a baseline format from which we can derive the others. What?

Programming style practices

Here's the beginnings of a grab-bag of emerging coding style and conventions, from a discussion with Guido.

Var Names
- exposed internal routines, vars, etc: leading _underscore
- constants: all UPPERCASE
- classes: Capitalized
- global names: not_abbreviated
- boolean vars: long_elaborate_informative_names?
- name groups lacking name space: common short prefix + '_' - sys.exc_type
Misc Python programming practices
- Avoid globals
- Avoid mutable globals - use classes or parameters+defaulting instead
- Avoid globals
- modules that contain mostly a single class are named after the class
- Put all imports near top
- Use doc strings, including module doc and version
- Use a version string, but only when you will use it with discipline

Perhaps we can start accumulating a list. I'd be willing to include it in my Python bestiary, if i ever get time to update it...