Extending Python for Numerical Computation
Submitted for the December 1995 Python Workshop by Jim Hugunin
This work is based on Jim Fulton's original implementation of a matrix
object as a container class.
Yet Another Numeric Language?
There are a huge collection of existing numeric programming languages, both
commercial (Matlab,
S-PLUS,
IDL) and free (
Octave, RLaB, Yorick ,
BASIS , Gnudl
, ...). Why on earth would I want to go out and create a new one?
I've used almost all of the available numerical languages at one time or
another over the past 8 years. One thing I've noticed is that over time,
the designers of these languages are steadily adding more of the features
that one would expect to find in a general-purpose programming language.
"Octave has a real mechanism for handling functions that take an
unspecified number of arguments, so it is no longer necessary to place
an upper bound on the number of optional arguments that a function can
accept."
Octave FAQ
"However, the most significant improvement is RLaB's list class. A list
is a heterogeneous associative array that can contain any data type,
including other lists. The list gives RLaB users the opportunity to
structure their data as necessary." Why Use RLaB?
"The eval function now provides an optional mechanism for detecting
and trapping error conditions that occur during the evaluation of the
argument expression." Matlab 4.1 Release Notes
"S-PLUS now supports a number of features from the object-oriented
programming paradigm, including classes, inheritance, and methods."
S-Plus 3.0 Release Notes
By starting with the Python
programming language as a base, I already have functions with optional
arguments (and keyword arguments), heterogeneous lists (and dictionaries), a
powerful exception system for evals (and everywhere else), and a strong
object system that was built into the language from the start; all in a
robust, well-designed and portable implementation (thanks Guido!). Rather
than trying to retrofit an existing numerical language to support the wealth
of features found in a powerful, modern, general-purpose programming
language, it makes much more sense to attack the problem from the other
direction and add the features of a powerful numerical programming language
to Python.
These issues are important even if the only use for this language is to be
a numeric lab. However, they become overwhelming if you want to build
applications that contain a blend of numeric and more traditional
computational needs. I've been implementing a speech recognition system
completely within my matrix extended version of python. This task would be
nearly impossible in any other numerical language. I'm making extensive use
of the sophisticated object and module system of python, as well as its
ability to handle sockets, audio i/o, general purpose UI's, etc.
Design Goals
Easily Extensible with FORTRAN and C libraries
This object is based on an original implementation by Jim
Fulton. His basic design already interfaced nicely with existing
libraries, and I just had to be careful not to break anything.
Close to the Speed of Optimized C for Vector Operations
Testing the system on a Sparc-10 by multiplying a 10000 length vector by
itself 1000 times, I found that the python implementation was 20% slower
than fully optimized (-O4) hand-coded C. When compared to existing
numerical languages, I found the python system to be 30% faster than matlab
(70% faster if I only need "float" precision), and 1000% faster than octave.
All Array Operators Supported
All of the basic numeric operators are supported element-wise for matrices.
In addition, the basic array comparison and logical operators are provided
as methods (because these operators can't be overloaded within python).
The jury is still out as to how best to implement matrix multiplication.
The two likeliest possibilities are as a method (ie. a.matrixMultiply(b)) or
by replacing the modulo operator (ie. a % b).
Arbitrarily High-Dimensional Arrays
Many existing numerical languages only support two-dimensional arrays.
This sort of arbitrary restriction is ridiculous in the current era of
modern programming languages and practice. The matrix object was designed
from the ground up to support arrays of arbitrary dimensions.
Powerful Generalized Product Form Indexing
Mapping semantics are used to support multi-dimensional product form
indexing for matrix objects. Multi-dimensional indices are a sequence of
python objects, where the first object corresponds to the first dimension of
the matrix, and so on. If there are fewer objects in the sequence than
dimensions in the matrix, each remaining dimension is selected in it's
entirety. The following objects are possible for each dimension:
- A single integer, indicating one element from that dimension and a
reduction in the number of dimensions in the returned matrix.
- A sequence of integers, not necessarily unique, selecting an arbitrary
collection of elements along that dimension
- A slice object which selects a range of elements along the dimension
with optional start index, stop index and step size.
Function Objects for Flexibility
Every arithmetic operator is implemented as a special ofunc object within
python. These functions can be called with matrix or scalar arguments to
return the normal result. In addition, these functions can be subscripted
to indicate that they should be applied at a specific rank, and they can be
modified to indicate that instead of a direct product, they should be
applied as a generalized outer, or inner product, or as a reduction or
accumulation.
Full Range of Primitive Data Types
The matrix object supports arrays of chars, bytes, shorts, ints, longs,
floats, doubles, complex floats, complex doubles, and raw python objects.
This is essential to allow interfacing with the full range of existing
numerical libraries. I know of no other numerical language that supports
such a complete collection of data types.
No Changes to the Python Core Required
I was shocked at how much of this system could be elegantly implemented by
designing two new object types (one for matrices, and one for functions on
matrices) and a module. Nevertheless, this effort did suggest two
relatively small patches to the python core to make numeric operations more
convenient, and these have been implemented by Konrad Hinsen. These include
a**b <--> pow(a,b) and a[2,3,4] <--> a[(2,3,4)]. Both of these additions
have already received Guido's preliminary sanction.
Current Status
All of the original design goals have been met. The matrix object is
currently in its second alpha release, and there have been minimal reported
bugs.
Current work includes:
- Implementing linear-algebra style multiplication properly
- Supporting sparse matrices
- Cleaning up printing and string representations of matrices
- Extending PyArgs_ParseTuple to handle matrices
- Experimenting with Yorick style indexing
- General cleanup and bug squashing
Acknowledgements
The initial structure of the matrix object was created by Jim Fulton. Many
of the ideas in this object are stolen from the members of the python
matrix-sig. In particular, Konrad Hinsen, Paul Dubois, Chris Chase, Thomas
Schwaller, Tser-Yuan Ya, David Ascher, Dong Gweon Oh, and of course, Guido
van Rossum (I'm sure that I'm missing people here) have been filled with
good ideas. I still would have made this object without any of them, but it
wouldn't be nearly so well designed.
Jim Hugunin
November 13,
1995