"Extension Building Considered Painful": Session Summary
by Greg Ward
The "Extension Building Considered Painful" session at IPC7 was
very productive, and there was a good consensus in the room as to
what's needed, what will work for various classes of users, and what
ideas to steal from other related systems (the closest being Red
Hat's RPM and Perl's MakeMaker).
Decisions made
Everyone seemed to agree with my proposed mode of operation: the
user downloads a package, unpacks it, and runs a Python script
tentatively called setup.py (the fact that this name
overlaps somewhat with the module description file in the Python
distribution and in many large module distributions is deliberate,
and may be construed as a bug or a feature). See below for
arguments to setup.py and what happens when it's run.
Also, nobody disagreed with my contention that the system should
work on "extensions" (ie. modules written in C) or plain ol'
"modules" (written in Python) with no difference in the user
interface. People also seemed to accept the idea of "building"
everything -- C modules, Python modules, eventually documentation --
to a temporary "build library" directory, tentatively called
blib/ . (This is one of the few implementation details
of Perl's MakeMaker that survived the session.) The
blib/ directory serves (at least) two purposes: it makes
installation near-trivial, and it provides a realistic-looking
pseudo-installation tree for running test scripts. The actual
structure of the blib/ tree has yet to be decided
(although I have definite ideas of how it should look!)
We talked about terminology a bit. This is going to be a sticky
issue, and will have to be one of the first things we thrash out on
a SIG. First, module, extension module
(extension for short), and package will keep their
conventional Pythonic meanings: a single .py file meant
to be imported by other modules or scripts, a module written in C,
and a collection of modules grouped under a directory containing an
__init__.py file.
I proposed distribution (or module distribution
when absolute clarity is needed) as a stand-in for package,
which is already taken. A module distribution contains one or more
modules (including extension modules), their documentation, test
suites, and a setup.py file. (The documentation and
test suites are optional but strongly recommended, especially if
anybody ever comes up with a standard way of documenting Python
modules. But a distribution must have at least one module
and a setup.py ). A single distribution will present
many faces to the world: the developer's source tree, a source
distribution, various binary distributions, the final installed
product, etc. (As I understand things, the source and binary
distributions are what Trove will call resources.)
One thing we didn't have time to decide on was a name for the
silly thing. For a long and formal name, my vote is Module
Distribution Utilities, or distutils for
short. The short name will be needed to group all the various
modules into a package: for instance, we plan to have
distutils.build , distutils.install ,
distutils.gen_make , and so forth.
Roles of setup.py
The planned interface to and tasks for setup.py
evolved rapidly throughout the session, mainly driven by the
division of labour identified by Eric Raymond and clearly written
down by Greg Stein, and the workflow diagram drawn by ??? and
expanded upon by me (see below).
It's clear that setup.py has to contain 1) metadata
about the package, and 2) any custom code needed to configure the
distribution for the current machine. It's not yet clear how these
should be expressed. The first idea is to steal
MakeMaker 's idea of calling a function with a bunch of
named arguments, which then does all the work behind the scenes. An
example setup.py using this scheme might be:
#!/usr/bin/env python
import distutils
distutils.setup (name = 'mymod',
version_from = 'mymod.py',
pyfiles = ['mymod.py', 'othermod.py'],
cfiles = ['myext.c'])
Obviously, custom configuration code would just be written in Python
before or after the call to distutils.setup ; it's not
clear how to make this code interact with everything that lies
behind distutils.setup .
The main competing idea is to do things in a more OO manner, by
defining a subclass and overriding various attributes and methods.
Here's an off-the-cuff illustration of the concept:
#!/usr/bin/env python
from distutils import Setup
class MySetup (Setup):
name = 'mydist
version_from = 'mymod.py'
pyfiles = ['mymod.py', 'othermod.py']
cfiles = ['myext.c']
In this case, it's a bit clearer how to override specific behaviour
of all the distutils classes: just subclass and
override as needed. Obviously, all of the classes would then
have to be well-documented!
Workflow for module distribution
The figure below illustrates the workflow involved in developing,
packaging, building, and installing a module distribution (but not
testing or documenting it, which ultimately should also be part of
the plan).
Note the three kinds of people present in the diagram:
- developer
- creator/maintainer of the base source tree
- packager
- a) someone (presumably the developer, wearing his "packager"
hat) who turns the base source tree into the source
distribution; or b) someone (possibly the developer, possibly a
friend with a machine running X, possibly an archive robot with
access to a machine running X) who builds the source distribution
to create a "built distribution" for architecture X
- user
- the poor sod who wants to install the distribution on his
machine; not necessarily someone who knows (or wants to know)
anything about Python
(These are the divisions of labour identified by Eric Raymond and
Greg Stein.) Note also that the developer, packager(s), and user(s)
are all smiling. This feature is planned, but not yet implemented.
Developer utilities
Obviously, the workflow starts at the top, with the developer's
source tree. While the developer is toiling away, he will probably
want a Makefile that knows about building Python
modules and extension modules (especially the latter). Rather than
writing his own, he can ask setup.py to generate one
for him (presumably using the distutils.gen_make
module):
./setup.py gen_make
Then the developer can run make , make
test , and so forth, just as he's probably used to doing
(assuming he's an old-fashioned Unix weenie!). If he doesn't like
Makefiles, or doesn't need one because this is a tiny little
project, he can just ask setup.py to build, test,
etc. directly:
./setup.py build
./setup.py test
(The idea is that setup.py will support "commands" --
build , test , etc. -- that correspond to
Makefile targets. That way, nobody ever has to depend
on a Makefile , but one can be generated for the
developer's convenience and efficiency (especially when working on
large distributions with lots of extension modules to be compiled).)
Packager utilities
When the developer is happy with the current state of his
module(s) and it's time for a release, he puts on his "packager" hat
and creates a source release:
make dist
# or, equivalently
./setup.py dist
This will bundle up all the files in the distribution (as listed in
a MANIFEST file) into an archive file of some sort --
perhaps .tar.gz under Unix, .zip under
Windows, etc. The name of the archive file would be derived from
the name and version of the module distribution:
mydist-1.2.3.tar.gz , for instance.
If he wishes, the developer can stop there and upload his source
release to an archive. Or, he can create built
distributions for all the architectures to which he has access.
(Note that I'm explicitly avoiding use of the more familiar term
binary distribution. That is because a module
distribution might well contain nothing more than .py
files and their associated documentation. Even in those cases,
though, there are reasons for a downloadable resource that can be
immediately installed. The main reason is consistency: it's nice if
naive users only have to deal with one kind of file for Python
module distributions (eg. Red Hat Linux users can just download and
install a bunch of RPMs; whether those RPMs contain .py
or .so files or both is immaterial). Second, there
might be non-binary files that are generated from files in the
source release, such as man pages generated from SGML source. The
built distribution for a Unix platform might include man pages ready
for installation, so no documentation processing would be
necessary.)
It is important to underscore the concept of packager as a
person separate from the developer. This is necessary to support
built distributions for multiple platforms, since not many developers
have access to a couple of Unix variants, Windows, and Mac --
they'll presumably need some help to make built distributions for
one or more platforms. This help may come in the form of a friend
(down the hall or around the world) who does have access to a
particular platform; it might come in the form of someone who
volunteers to keep certain distributions up-to-date for certain
platforms; or it might take the form of an archive robot that
automates the procedure. Security concerns become increasingly more
relevant traversing that list.
I have in mind a couple of possible interfaces for creating built
distributions; furthermore, the idea of "dumb" vs "smart" built
distributions has been forming in my head since Developer's Day.
(Thus it probably doesn't really belong here, since this is meant to
be a summary of the Developer's Day session. So sue me.) First,
consider the creation of a traditional Unix built distribution: a
.tar.gz file to be unpacked under
/usr/local (or, more likely in the Python library
context, /usr/local/lib/python1.x ). This could be
accomplished with:
./setup.py bdist
which would do a build (to put a mock installation tree into
./blib/ ) and package the build tree to an archive file
named after the distribution name and version number, and the
current platform, e.g. mydist-1.2.3-sunos5.tar.gz
or mydist-1.2.3-win32.zip .
However, there's a lot of interest in "smart" installers like Red
Hat's RPM (and I got the impression that there are a couple of
competing possibilities for the Windows world -- someone from the
dark side will have to fill me in on that). My current thinking is
that there should be a separate command (or Makefile
target) for each of these, so you might run
make rpm
on a Red Hat Linux box, and
setup.py xxx
on a Windows machine (where xxx is the abbreviated name of
some smart installer for Windows). Supporting the old-fashioned
"dumb" built distribution model is important, though -- not everyone
will have that fancy new installer (or they might have a different
smart installer).
User utilities
Finally, and perhaps most importantly, we most consider the lucky
user who wishes to install a Python module distribution on his
computer. Users come in all shapes and sizes, but we're mainly
concerned with two distinctions:
- built distribution users: anyone on a popular platform for
which a built distribution is available (or necessary:
many Mac and Windows people won't have a compiler)
- source distribution users: people on less-popular platforms
for which a compiler (and other possibly necessary tools)
will most likely be available
Obviously, things should be utterly painless and simple for naive
users who just want to install some modules (possibly pure Python,
possibly extensions -- it shouldn't matter!) to get something else
working. Smart installers like RPM will help here, but it should be
almost as easy to start with a "dumb" built distribution or a source
distribution. We must also keep in mind that there will be many
people who have to use source distributions who are not necessarily
programmers, and just want to get this silly thing installed and
working -- so using a source distribution should be just as easy
(although it will require more machine time and a few more commands)
as using a built distribution. Even experienced hackers who
could dive into the source and mess around with it, or
fiddle with Makefile s, or supply the locations of
needed libraries, rarely want to do such things.
Where to go next
First, I think this topic is big enough to warrant a new sig,
which I'm tentatively calling the distutils-sig. The proposed
charter for that {will be|has been} posted to the meta-sig, so run
over there if you think the whole concept is hopeless and you want
to shoot me down in flames before this even gets started (or if you
think the name sucks).
Once the sig is created, I'd like to spend some time
discussing meta-issues: does anyone violently disagree with the
whole idea? is 'distutils' a good enough name? what functionality
should be present in the first pass, and what's needed for a full
release? Then we can dive into nitty-gritty design and
implementation issues.
|