PEP:304
Title:Controlling Generation of Bytecode Files
Version:2079
Last-Modified:2005-06-22 15:56:06 -0700 (Wed, 22 Jun 2005)
Author:Skip Montanaro
Status:Draft
Type:Standards Track
Content-Type:text/x-rst
Created:22-Jan-2003
Post-History:27-Jan-2003, 31-Jan-2003, 17-Jun-2005

Contents

Abstract

This PEP outlines a mechanism for controlling the generation and location of compiled Python bytecode files. This idea originally arose as a patch request [1] and evolved into a discussion thread on the python-dev mailing list [2]. The introduction of an environment variable will allow people installing Python or Python-based third-party packages to control whether or not bytecode files should be generated at installation time, and if so, where they should be written. It will also allow users to control whether or not bytecode files should be generated at application run-time, and if so, where they should be written.

Proposal

Add a new environment variable, PYTHONBYTECODEBASE, to the mix of environment variables which Python understands. PYTHONBYTECODEBASE is interpreted as follows:

After startup initialization, all runtime references are to sys.bytecodebase, not the PYTHONBYTECODEBASE environment variable. sys.path is not modified.

From the above, we see sys.bytecodebase can only take on two valid types of values: None or a string referring to a valid directory on the system.

During import, this extension works as follows:

Note that this PEP is explicitly not about providing module-by-module or directory-by-directory control over the disposition of bytecode files.

Glossary

  • "bytecode base" refers to the current setting of sys.bytecodebase.
  • "augmented directory" refers to the directory formed from the bytecode base and the directory name of the source file.
  • PYTHONBYTECODEBASE refers to the environment variable when necessary to distinguish it from "bytecode base".

Locating bytecode files

When the interpreter is searching for a module, it will use sys.path as usual. However, when a possible bytecode file is considered, an extra probe for a bytecode file may be made. First, a check is made for the bytecode file using the directory in sys.path which holds the source file (the current behavior). If a valid bytecode file is not found there (either one does not exist or exists but is out-of-date) and the bytecode base is not None, a second probe is made using the directory in sys.path prefixed appropriately by the bytecode base.

Writing bytecode files

When the bytecode base is not None, a new bytecode file is written to the appropriate augmented directory, never directly to a directory in sys.path.

Defining augmented directories

Conceptually, the augmented directory for a bytecode file is the directory in which the source file exists prefixed by the bytecode base. In a Unix environment this would be:

pcb = os.path.abspath(sys.bytecodebase)
if sourcefile[0] == os.sep: sourcefile = sourcefile[1:]
augdir = os.path.join(pcb, os.path.dirname(sourcefile))

On Windows, which does not have a single-rooted directory tree, the drive letter of the directory containing the source file is treated as a directory component after removing the trailing colon. The augmented directory is thus derived as

pcb = os.path.abspath(sys.bytecodebase)
drive, base = os.path.splitdrive(os.path.dirname(sourcefile))
drive = drive[:-1]
if base[0] == "\\": base = base[1:]
augdir = os.path.join(pcb, drive, base)

Fixing the location of the bytecode base

During program startup, the value of the PYTHONBYTECODEBASE environment variable is made absolute, checked for validity and added to the sys module, effectively:

pcb = os.path.abspath(os.environ["PYTHONBYTECODEBASE"])
probe = os.path.join(pcb, "foo")
try:
    open(probe, "w")
except IOError:
    sys.bytecodebase = None
else:
    os.unlink(probe)
    sys.bytecodebase = pcb

This allows the user to specify the bytecode base as a relative path, but not have it subject to changes to the current working directory during program execution. (I can't imagine you'd want it to move around during program execution.)

There is nothing special about sys.bytecodebase. The user may change it at runtime if desired, but normally it will not be modified.

Rationale

In many environments it is not possible for non-root users to write into directories containing Python source files. Most of the time, this is not a problem as Python source is generally byte compiled during installation. However, there are situations where bytecode files are either missing or need to be updated. If the directory containing the source file is not writable by the current user a performance penalty is incurred each time a program importing the module is run. [3] Warning messages may also be generated in certain circumstances. If the directory is writable, nearly simultaneous attempts attempts to write the bytecode file by two separate processes may occur, resulting in file corruption. [4]

In environments with RAM disks available, it may be desirable for performance reasons to write bytecode files to a directory on such a disk. Similarly, in environments where Python source code resides on network file systems, it may be desirable to cache bytecode files on local disks.

Alternatives

The only other alternative proposed so far [1] seems to be to add a -R flag to the interpreter to disable writing bytecode files altogether. This proposal subsumes that. Adding a command-line option is certainly possible, but is probably not sufficient, as the interpreter's command line is not readily available during installation (early during program startup???).

Issues

Examples

In the examples which follow, the urllib source code resides in /usr/lib/python2.3/urllib.py and /usr/lib/python2.3 is in sys.path but is not writable by the current user.

In the Windows examples which follow, the urllib source code resides in C:\PYTHON22\urllib.py. C:\PYTHON22 is in sys.path but is not writable by the current user.

Implementation

See the patch on Sourceforge. [6]

References

[1](1, 2) patch 602345, Option for not writing py.[co] files, Klose (http://www.python.org/sf/602345)
[2]python-dev thread, Disable writing .py[co], Norwitz (http://mail.python.org/pipermail/python-dev/2003-January/032270.html)
[3]Debian bug report, Mailman is writing to /usr in cron, Wegner (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111)
[4]python-dev thread, Parallel pyc construction, Dubois (http://mail.python.org/pipermail/python-dev/2003-January/032060.html)
[5]PEP 302, New Import Hooks, van Rossum and Moore (http://www.python.org/dev/peps/pep-0302.html)
[6]patch 677103, PYTHONBYTECODEBASE patch (PEP 304), Montanaro (http://www.python.org/sf/677103)