An ARRAY type for Python

Guido.van.Rossum@cwi.nl
Sun, 07 Feb 1993 17:59:40 +0100

About two weeks ago Stefan Esser asked about an array type for Python.
Actually his question consisted of two parts: (1) he wanted an array
of objects represented as machine words -- an extension of the string
type, really; and (2) he wanted to extend Python with a notion of
multidimensional arrays.

I've been rather busy lately so I haven't found the time to answer
either question, so far. (Sorry Stefan!) But I did get an idea that
would address at least the first question. Let me start by explaining
some background.

Python's notion of sequences has always been intended to allow
extension with new kinds of sequences. The minimal set of properties
of a sequence object a is that it has a length, len(a), and that if
0 <= i < len(a): a[i] is defined (and for all other i, a[i] is not
defined). The semantics of slicing, concatenation and repetition
(a[i:j], a+b and a*k) are also fixed. An extension to mutable
sequences allows assignment to items, to slices, and the list
operations append and insert. (The operations remove, index, count,
reverse and sort are perhaps better seen as specific to the list
type.)

The existing sequence types in Python are list, tuple and string.
But we can easily extend this with arrays of machine integers, say.
Such an array would behave like a list in most circumstances, but it
would restrict the values contained in the list to machine integers,
and therefore a more compact representation can be used.

I have written a module that creates such arrays, with various types
of of values (characters, integers of 3 sizes, float and double), and
placed it on ftp.cwi.nl:/pub/python in file arraymodule.c. If you
have dynamic loading in your Python, you can just compile it and
import it (the file arraymodule.libs should contain the name of the C
library, e.g. /lib/libc.a; on SGI's -lc_s is better). Otherwise you
can add it to your Python interpreter by using the Addmodule.sh script
and rebuilding Python.

Here's a short description of my array module.

========================================================================

The module array defines one function.

array(typecode, initializer)
Return a new array whose items are restricted by the typecode,
and initialized from the (optional) initializer value.
The typecode is a character which defines the item type (users
of the getattr() C function in the Python interpreter will
recognize this):
'c' - charcter
'b' - 1-byte signed integer
'h' - 2-byte signed integer
'l' - 4-byte signed integer
'f' - float
'd' - double
If an initializer is present, it must be a list or a string.
The list or string is passed to the new array's fromlist() or
fromstring() method (see below) to add initial items to the
array.

Array objects are mutable sequence types and support the following
data items and methods.

typecode
The typecode character used to create the array

itemsize
The length in bytes of one array item in the internal
representation

append(x)
Append a new item with value x to the end of the array.

insert(i, x)
Insert a new item with value x in the array before position i.

read(f, n)
Read n items (as machine values) from the file object f and
append them to the end of the array.
If less than n items are available, EOFError is raised, but
the items that were available are still inserted into the
array.

write(f)
Write all items (as machine values) to the file object f.

fromstring(s)
Appends items from the string, interpreting the string as an
array of machine values (i.e. as if it had been read from a
file using the read() method).

tostring()
Convert the array to an array of machine values and return the
string representation (the same sequence of bytes that would
be written to a file by the write() method.)

fromlist(l)
Appends items from the list. This is equivalent to
for x in l: a.append(x)
except that if there is a type error, the array is unchanged.

tolist()
Convert the array to an ordinary list with the same items.

When an array object is printed or converted to a string, it is
represented as array(<typecode>, <initializer>). The initializer is
omitted if the array is empty, otherwise it is a string if the
typecode is 'c', otherwise it is a list of numbers. The string is
guaranteed to be able to be converted back to an array with the same
type and value using reverse quotes (``). Examples:
array('l')
array('c', 'hello world')
array('l', [1, 2, 3, 4, 5])
array('d', [1.0, 2.0, 3.14])

========================================================================

Adding multidimensional arrays to Python is harder, since the
interpreter currently doesn't know about this concept. I have some
ideas that don't require extending the Python syntax, which basically
boil down to separating the actual array data from a definition of
how the rows and columns are laid out (e.g. one could transpose an
array without copying the data). I'll work on this a little more and
maybe it'll see the light of day.

In the mean time, please try out my array module and report any bugs
or functionality you would like to see added to me -- it may become a
standard module in the next release.

Cheers,

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>