PyPy - an implementation of Python in Python

It has become a tradition in the development of computer languages to implement each language in itself. This serves many purposes. By doing so, you demonstrate the versatility of the language, and its applicability for large projects. A compiler/interpreter is close to as complex as software ever gets.

The Pypy project aims to do this for Python and has made some significant progress. In five one week sprints, each attracting approximately 10 developers each, we have made an almost complete implementation of Python in Python. Currently it is rather slow, benchmarking at a factor 4 000 times slower than regular Python (henceforth referred to as CPython).

In the next step of the project, we will generate C code or machine code from the source of Pypy, thereby reducing the speed penalty.

Later in the project, we will introduce optimisation steps that should make Pypy run faster than CPython.

An important aspect of implementing Python in Python is the high level of abstraction and compactness of the language. This yields an implementation that is much easier to understand than the one done in C. The parts so far implemented are about 15 000 lines of code, with another 7000 lines of unit tests.

Another carrying idea in Pypy is to build the implementation in the form of a number of independent modules with clearly defined API's. This eases reuse and allows experimenting with multiple implementations of specific features.

Architecture overview

In its current, rather simple incarnation, Pypy has 2 major parts - the interpreter and an object space. Both parts are implemented in Python and run under CPython.

The Interpreter

The interpreter accepts a Python source code file and hands it off to the compiler (PyPy's currently using CPython's built-in compiler) which turns it into a series of byte codes (the byte codes may be stored in a .pyc file, if the source code is being imported rather than directly run). The byte codes are instructions for a virtual machine, which is just the same as the one used in regular CPython. Each byte code is made up of an operation code (opcode) and optionally some operands.

The interpreter then takes the bytecodes, one by one, and performs whatever operation each opcode instructs it to do. For all operations that affect or consult objects, the PyPy interpreter delegates all actual work on the object to the object space.

The Object Space

The object space contains all objects that have been created and knows how to perform operations on the objects. You may think of an object space as being essentially a complete library implementing a fixed API, a set of operations, with implementations that correspond to giving some semantics to Python object types. An example of an operation might be add: add's implementations are responsible for performing numeric addition when add works on built-in numbers, concatenation when it works on built-in sequences.

We currently have 2 working object spaces which are more or less interchangeable:

In addition, we have 2 special purpose object spaces under development; the Flow Object Space and the Trace Object Space. These will be described later.

The Standard Object Space

The Standard Object Space itself is an object which implements a number of logistic services. At startup, the Standard Object Space examines certain directories for modules that contain the implementation of the available Python types. The Standard Object Space loads each such module, and, in turn, each module so loaded registers the type it deals with, and all the operations and implementations for the type, in multimethod objects in the Standard Object Space.

Later in the execution, when the interpreter delegates the execution of an instruction to the Standard Object Space, the Standard Object Space selects the correct implementation of the operation to perform through the multimethod.

We will examine how the multimethod mechanism works through an example.

We examine the types float and int and disregard all other types for the moment. Each type has an __add__ operator, which is registered with the Standard Object Space at startup. The float add operator is registered with the type signature __add__(Float, Float) and the integer add operator is registered with the signature __add__(Int, Int).

If in our application program we write the expression 2+3, the interpreter will create an object containing the value 2 and an object containing the value 3. We annotate these as Int(2) and Int(3) respectively. The interpreter then calls the Standard Object Space with __add__(Int(2), Int(3)).

The Standard Object Space then delegates the operation to the __add__ multimethod, which finds an implementation that has the signature __add__(Int, Int). Since this is a "direct hit", the multimethod can immediately dispatch the operation to the correct method, i.e., the one registered as the implementation for this signature.

If the multimethod doesn't have any registered functions with the correct signature, as would be the case for example for the expression 2+3.0, the multimethod tests if it can use coercion to find a function with a signature that works. In this case we would coerce Int(2) to Float(2.0) in order to find a function in the multimethod that has a correct signature.

Application-level and interpreter-level

Since the same language (Python) is what PyPy both uses and deals with, a given piece of Python source can be playing either of two rather distinct roles in the PyPy system. To be systematic in distinguishing these roles, we call them interpreter level (Python used to implement the interpreter and object space) and application level (Python that we are interpreting). To show the difference with an example: to sum the contents of two variables a and b, typical application-level code is a+b -- in sharp contrast, typical interpreter-level code is space.add(w_a, w_b), where space is an instance of the standard object space class and w_a and w_b are typical names for the wrapped versions of the two variables. By contrast, think of traditional CPython: it has the same application-level code, but interpreter-level code is C source, and typical code for the addition would be PyNumber_Add(p_a, p_b) where p_a and p_b are C variables of type PyObject*.

You might think that the need to keep the two roles for the same language distinguished is a cause of confusion; sometimes, indeed, it is possible to get confused between the two levels while developing PyPy. However, the existence of the two levels for the same language also offers a powerful advantage, fully offsetting the possibility of confusion (and then some): PyPy lets you use application-level code for the purpose of implementing interpreter-level operations.

Application-level code is much higher-level, and therefore easier to write and debug. For example, suppose we want to implement the update method of dict objects. Programming at the application level, we can write the obvious, simple implementation, one that looks like an executable-pseudocode definition of update:

def update(self, other):
    for k in other.keys():
        self[k] = other[k]

If we had to code only at the interpreter level, we would have to code something much lower-level and involved, say something like:

def update(space, w_self, w_other):
    w_keys = space.call_method(w_other, 'keys')
    w_iter = space.iter(w_keys)
    while True:
        try: w_key = space.next(w_iter)
        except NoValue: break
        w_value = space.getitem(w_other, w_key)
        space.setitem(w_self, w_key, w_value)

This interpreter-level implementation looks much closer to the C source code implementing this functionality in CPython, although, depending on one's relative familiarity with C, Python, and PyPy's coding conventions, it may still be considered somewhat more readable. In any case, it should be obvious that the application-level implementation is definitely more readable and maintainable than the interpreter-level one.

Wrapping

The w_ prefixes so lavishly used in the previous example indicate, by PyPy coding convention, that we are dealing with wrapped objects, that is, objects constructed by the object space at interpreter level to implement corresponding application-level objects. Each object space supplies wrap and unwrap operations that move between the two levels for objects of simple built-in types; other Python types are implemented by interpreter-level classes with some amount of internal structure.

For example, an application-level Python list is implemented as an instance of W_ListObject`, which has an instance attribute ob_item (an interpreter-level list which contains the application-level list's items as wrapped objects) and another attribute ob_size which records the application-level list's length (we want to be able to do "over-allocation" in ob_item, for the same reasons of performance that lead CPython to do it, and therefore the length of ob_item is allowed to be greater than the length of the application-level list -- it is for this reason that the length in question has to be explicitly recorded in ob_size).

RPython translation

All the interpreter level code is implemented in a subset of Python called RPython. This subset has not been fully defined yet, but it has some restrictions on parts of Python you can use, as well as some restrictions on how you can combine different constructs. The purpose of this to have a language that is suitable for turning into optimised code in a low level language. Our current target is C, but going directly to machine code for a real or virtual machine are other paths that could be taken.

In order to produce a fast version of the PyPy interpreter we apply a special version of the PyPy interpreter on the regular PyPy interpreter. Once we have loaded the entire interpreter and the Standard Object Space, we have a collection of byte codes that constitute the PyPy interpreter program. This is then fed to the special interpreter, which has the Standard Object Space replaced with a Flow Object Space.

The Flow Object Space drives the execution in such a way that all paths in the code are followed. As a side-effect it produces flow graphs, which are graphs capturing the control flow of the code. The nodes of the graphs are basic blocks of logged Object Space operations executed sequentially. How values flow is also encoded in the graphs.

The flow graphs are fed as input to the Annotator. The Annotator, given entry point types, infers the types of values that flow through the program variables. This is possible because of the constraints that RPython places on the interpreter code.

In the last phase the type annotated flow graphs are visited by the Translator, which emits "low-level code". To do this, it uses the types annotation, the basic block listed operations and control flow information encoded in the graphs. Our first target will probably be Pyrex code, which is then translated to compilable C by Pyrex itself.

Object Space tricks

The fact that the Interpreter and the Object Spaces are separate packages with an explicit and stable API in between gives us some new and interesting possibilities.

The first one is that it is easy to write an Object Space that wraps the Standard Object Space and interfaces to the Interpreter.

One such Object Space is the Trace Object Space, which allows us to examine and trace all instructions that are passed to the Standard Object Space as well as all results being passed back to the Interpreter. Since all object spaces have the same API, the Trace Object Space should be able to trace the execution of any version of the PyPy interpreter, e.g. the one compiling RPython.

Up to here we have been speaking about things that we have, or are working on. The rest of this chapter is about things that could be done with our architecture, but we haven't done yet.

Another possibility is an object space that interfaces to a Standard Object Space on another machine. From this it is not far to an object space that keeps tab of where its objects reside and is able to delegate its operations to more than one object space. For instance, we can have multiple objects spaces which reside in different threads in the same process.

At the other end, it is possible to replace the Interpreter with one that interprets instructions for a different Virtual Machine, but still uses the Standard Object Space. It is a rather short conceptual step from this to allowing delegation between multiple Interpreters, making it possible to mix both source languages and bytcode instruction sets. Naturally there are limits to how different the underlying models of the virtual machines can be before they become incompatible with the Standard Object Space, but even this could be solved, using multiple interpreters delegating to multiple object spaces. However, the code for interaction between the object spaces would probably be quite complex.