Dean W. Hall
dean.hall@computer.org
Austin, TX, U.S.A.
19 March 2003
PyMite is a flyweight Python interpreter written from scratch to execute on 8-bit microcontrollers as well as desktop computers. It is a work-in-progress, but is developed enough to run demonstration programs. This paper explains the motivation for creating PyMite and gives an overview of what PyMite can and cannot do. Then the current status and the work ahead are discussed. This is followed by details on the design and implementation of PyMite. Two new features are mentioned: ``stackless'' frames and in-line native code. Finally, information is given on how to obtain PyMite.
There are more embedded microcontrollers (MCU) on this planet than there are people living. The ubiquity of these devices is due to the low cost and high power of the 8-bit architecture. Programming 8-bit microcontrollers is both an occupation and a hobby. All 8-bit architectures can be programmed in assembly, fewer in C, and still fewer in a higher level language. C and BASIC are the most popular languages above assembly for these devices. To the hobbyist, C has a steep learning curve. For the professional, BASIC lacks powerful semantics. PyMite brings the expressive, high level language, Python, to the embedded space.
Python has a short learning curve and robust builtin data types. The short learning curve will make the recreational programmer more productive and the data types will make him more effective. Python is also a rapid prototyping language. Bringing this feature to the embedded space is a real benefit. PyMite tries to keep everything that makes Python a great language.
The Media Laboratory at the Massachusetts Institute of Technology created Interactive C (IC) to assist the development of intelligent mobile robots [1]. IC compiles a subset of C to bytecode and interprets that on an 8-bit Motorola 68HC11. PyMite replaces IC and provides a modern language with powerful builtin datatypes.
This section gives an overview of what PyMite is, how it is used, the current status of the project, and the work yet to be done.
PyMite is a bytecode interpreter and set of libraries
that can execute an application on an 8-bit
microcontroller. The application is written
in the Python syntax and compiled to a code object by Python.
At that point, a new tool, PmImgCreator, creates a
relocatable image of the program. This image is comparable to
a .pyc
file, but is condensed and formatted for PyMite.
The image does not need to be linked to the PyMite executable,
unless it contains native code. If the image contains no
native code, PyMite only needs to be informed
where to find the image.
Once the microcontroller is programmed and operating, the PyMite system initializes itself and loads the application image. Just like a Python module, the application can be a simple script, a set of functions, or a class with methods. It can import other modules and call those functions. It can even call native C libraries or assembly code through a PyMite-specific native code mechanism. PyMite uses a heap to manage memory. A hybrid best-fit allocator distributes memory chunks, while a hybrid mark-sweep collector reclaims them.
PyMite can execute many programs that are filtered
through PmImgCreator. PmImgCreator detects
and reports source code that produces unsupported bytecode.
Once these errors are fixed, PyMite can run the program
on a limited basis.
For example, if the code is ``answer = 40 + 2
'' it will
execute as expected. However, if the code is
``answer = "forty" + "two"
'', it will fail during execution
because the string-concatenation definition of the addition operator
is not implemented. There is not yet a definitive list of
supported features. The general rule is that nearly all
of the arithmetic and logical operations on integers are supported.
Only some of the operations on the remaining builtin types (Strings, Lists,
Tuples, Dicts) are supported, and very few of Python's fancy
features are implemented. To be more concrete on this matter,
a well-known module is demonstrated.
The PyStone Benchmark Program (pystone.py
) comes as a part
of the standard Python 2.0 distribution.
This program only needs two changes in order to run in PyMite.
When this module is parsed by PmImgCreator,
line 37 returns one bytecode incompatibility.
Line 37 reads ``from time import clock
''
which is compiled to the unsupported bytecode IMPORT_FROM
.
The source line is changed to two lines: ``import time
''
and ``clock = time.clock
''.
Now, PmImgCreator compiles and creates and image.
When the image is interpreted, a runtime error occurs.
Line 251 uses the Python magic
``if __name__ == '__main__':
''
to conditionally call the main()
function.
PyMite does not yet support the __name__
attribute,
so this line is removed and the subsequent line
is unindented so that main()
is called unconditionally.
With these two changes made, pystone.py compiles to an image containing 4823 bytes. The image loads into the PyMite interpreter and executes properly until the target device's heap is exhausted and the interpreter halts.
In addition to simple scripts, PyMite is capable of running
code which calls native library functions and controls
the author's mobile robot platform. Despite this minor success,
there is plenty of work remaining.
Classes can be created, but inheritance does not work.
The try/catch
mechanism
creates the necessary blocks of information, but raising and
handling exceptions is not yet implemented.
Heap memory is allocated using a hybrid best-or-nearest-fit
policy and some deleted objects are reclaimed explicitly.
The collector is designed and developed, but not completely
tested. The list goes on. The author is resolving these needs
on a priority basis.
This section discusses the goals established when designing PyMite and what it took to meet these goals.
The goals of the PyMite interpreter are:
The goal of fitting within a fixed size environment is listed first because it is the most obvious and most restrictive constraint of this project.
Most 8-bit microcontrollers are limited to 16-bit addressing. This means that only 64 KiB of memory is directly addressable. The sum total of addressable memory is divided among program memory and RAM in various ways depending on the target device's architecture. RAM is further divided between the interpreter's run-time requirements and PyMite's heap. Regardless of how memory is divided, 64 KiB is a very real and very tight boundary compared to the near limitless boundaries Python uses on a 32-bit PC with virtual memory.
Concerning size, there are two pieces of good news. The first is that some architectures have separate program and data memories, so there can be as many as 64 Ki addressable words in each memory. The second is that persistent memory can be extended with the addition of serial memories. This persistent memory could hold code images such as PyMite's library and the application.
In order to meet the 64 KiB limit, PyMite is written from scratch, rather than trimmed from the Python source tree. This decision allows all objects to be reimplemented and reduced in size. For example, in Python the Dict object uses a hash table in its implementation; but in PyMite, a segmented linked list (seglist) structure is used for its small size and simplicity. The seglist is also reused for the implementation of PyMite's List type. Using the seglist also reduces memory fragmentation by reusing identically sized segments.
By shrinking the objects and removing extraneous pointers,
some of Python's dynamic features are sacrificed.
For instance, if two integer variables are added
via the BINARY_ADD
bytecode, they will always be added
via an internal function. This is different from Python where
each object would be inspected for an __add__()
method.
The example above shows how the ability to overload the addition
operator is lost in PyMite.
To date, there are no complex, long, or floating point data types or operations. Floating point might be added, but only after more important features such as exceptions and garbage collection are completed. Since PyMite is not finished, a complete list of unsupported features and operations is not available.
PyMite is not without new and novel features. A ``stackless'' frame implementation and native code lead the innovative features in PyMite.
In the Python community ``stackless'' has come to mean an implementation of the interpreter function that is decoupled from the C stack [2]. The term ``stackless'' comes from the fact that a non-stackless (stackful?) implementation consumes C stack space with each nested Python function. The C stack grows with every Python function call and Python objects can sometimes be found in C function arguments and return values. There are currently no plans for PyMite to support coroutines or generators like Tismer's Stackless, but having a stackless implementation leaves the door open. For now, PyMite's stackless implementation simply makes its RAM usage and threading model more compact.
The second new feature added to PyMite is its convenient
method of adding native code to the application. In-line
native code allows insertion of a compiled language into
the Python code. PyMite does this by letting
the programmer write C code inside the documentation string
(docstring) of a Python function. The docstring is a
triple-quoted string usually used to describe the purpose
of the function. PyMite's size constraints eliminate
using docstrings for introspection, so the author redesignated
the purpose of this string. The contents of the docstring
are largely ignored by the Python compiler. The image creator
tool, PmImgCreator, extracts docstrings prefixed
with the substring ``__NATIVE__\n
'',
reformats the source, and compiles the results
into a native C module that must be linked
with the PyMite executable.
Native code is explained in greater detail in
[3].
This section gives a brief overview of how PyMite is implemented. The run-time datastructures are presented and the interpreter is explained. But first, some comments on portability.
The PyMite interpreter is written in ANSI C with Doxygen style comments[4]. Platform-specific extensions are written in gcc-compatible syntax. The author uses GCC compile to the desktop host and cross-compile to the target. The source code is written to be highly portable. With the introduction of native code, it is the author's goal to make the interpreter's source code completely portable, and place the platform-specific code in native libraries. It remains to be seen if 100% portability can be achieved.
PyMite is currently targeted to the Atmel AtMega103 8-bit microcontroller. This is the processor found on the author's mobile robotics circuit board, the MMB103. The AtMega103 has 128 KiB programming memory (arranged as 64 Ki words), 4 KiB SRAM and 4 KiB EEPROM.
Before exiting the topic of portability, it should be mentioned that PyMite currently compiles for Linux/x86 as well. Doing this changes the sizes of the run-time datastructures because pointers become 32-bit fields instead of 16. Aside from that, and the need to put all code images into RAM, PyMite can execute on an x86 running Linux. This configuration is used for testing and debugging.
PyMite has a set of datastructures that are the foundation of the interpreter. There are implicit structures for image objects and explicit structures for dynamic objects. The implicit structures are used to load a static image from memory into a dynamic structure in RAM. The explicit structures are used to build the run-time framework of the interpreter and represent the dynamic objects within the running program.
The implicit structures for images are not defined by a C structure, instead they are a convention used by PmImgCreator and PyMite's image loader system. There are seven implicit structures for None, Integer, Float, String, Tuple, CodeObject and NativeCodeObject. None is represented by a single null byte. Integer and Float share the same simple form. The remaining four--String, Tuple, CodeObject, and NativeCodeObject--are slightly more complex. These images are loaded into their corresponding dynamic structures, discussed next.
The head of every datatype contains a two byte Object Descriptor (OD). The OD, in turn, contains size and type fields and two bits used by the heap management system. The size field is used by the memory system to assist with best-fit allocation. The type information is used throughout the interpreter to define the type of the object. Every PyMite object has an OD embedded in the head of its structure. The core objects are presented in order from simple to complex.
The None object is simply an OD with its type set to None. An Integer object is an OD follow by four bytes. The four bytes contain a signed integer value. Floating point objects are not yet supported and will not be discussed further. A String has an OD header, an unsigned byte representing the number of bytes in the string, followed by that many bytes containing the UTF-8 null-terminated string.
The complex objects are Tuples, Lists and Dicts. They are called complex because they may contain other objects. Tuples are immutable objects whose size is predetermined. The tuple's structure is allocated with exactly enough memory for its contents. Lists and Dicts are mutable, dynamic objects. They are both implemented using segmented linked lists (seglist) to allow for varying content length. The Dict uses one seglist for its keys and one for its values.
There are also structures for the objects that work behind the scenes. These are code objects (CO), functions and frames.
Once the system is initialized and images are loaded into a CO,
a function object is created for the root bytecode of
the target module. This root function object is passed
to the interpret function, interp()
, where a frame
is created and execution begins. The frame is the run-time
representation of the function. The frame is responsible for
maintaining state information during
execution and when a nested function is called.
Figure 6 shows how images,
code objects, functions and frames link together
to form the core organization of the interpreter.
The PyMite interpreter has the prototypical while-switch
organization. Interpretation of bytecodes happens as long
as the interpreter control value is positive. A control value
of zero indicates proper exit; a negative value an error.
The switch statement directs program flow based on the value
of the bytecode. PyMite implements 64 of the 111 bytecodes
in Python 2.0 as seen in Table 1.
Some of the remaining bytecodes will be
implemented as needed, others will not. For example, the
slicing bytecodes might be implemented, but the five
bytecodes involved in printing will not.
Instead, PyMite will use a native library for printing.
It is unlikely that the bytecodes
CALL_FUNCTION_VAR
, CALL_FUNCTION_KW
,
CALL_FUNCTION_VAR_KW
, and EXTENDED_ARG
,
will be implemented anytime soon.
There is plenty of work to be done on PyMite. Short term goals include completing the garbage collector and the exception system. Mid term goals include optimization work and creating an interactive interface. The long term goals involve growing a support community for PyMite and making periodic releases.
During the creation of PyMite several possible optimizations were discovered. Some, such as string caching, are implemented, but some are not. Optimizations that are under consideration are: renumbering and creating new bytecodes; using a goto-lookup structure for the interpreter; and using caching methods that are faster than linear lookups. Profiling information will determine which optimizations are included in PyMite.
PyMite successfully implements a flyweight interpreter that supports a subset of the features of the Python language. PyMite executes sample programs on a real 8-bit microcontroller using only 15 KiB of program memory and 4 KiB of RAM. Fitting the interpreter in such a size-limited device requires completely redesign the internal structures to reduce size and sacrificing of Python's fancier features. Some of the lost features are made up for by new features such as a stackless implementation and an easy method to create and call native code. PyMite is a work-in-progress and its features and abilities will increase with time.
PyMite is Copyright 2001 Dean W. Hall. All rights reserved. PyMite is distributed under the Artistic License [5]. PyMite releases can be downloaded from http://sourceforge.net/projects/thmonkeeproject/. Access to the Subversion repository is currently restricted.
http://www.newtonlabs.com/ic/
.
http://www.stackless.com/
.
http://sourceforge.net/projects/thmonkeeproject/
.
http://www.doxygen.org/
.
http://www.opensource.org/licenses/artistic-license.php
.