Introduction
This PEP describes a proposal to define "@" (pronounced "across")
as a new outer product operator in Python 2.2. When applied to
sequences (or other iterable objects), this operator will combine
their iterators, so that:
for (i, j) in S @ T:
pass
will be equivalent to:
for i in S:
for j in T:
pass
Classes will be able to overload this operator using the special
methods "__across__", "__racross__", and "__iacross__". In
particular, the new Numeric module (PEP 0209) will overload this
operator for multi-dimensional arrays to implement matrix
multiplication.
Background
Number-crunching is now just a small part of computing, but many
programmers --- including many Python users --- still need to
express complex mathematical operations in code. Most numerical
languages, such as APL, Fortran-90, MATLAB, IDL, and Mathematica,
therefore provide two forms of the common arithmetic operators.
One form works element-by-element, e.g. multiplies corresponding
elements of its matrix arguments. The other implements the
"mathematical" definition of that operation, e.g. performs
row-column matrix multiplication.
Zhu and Lielens have proposed doubling up Python's operators in
this way [1]. Their proposal would create six new binary infix
operators, and six new in-place operators.
The original version of this proposal was much more conservative.
The author consulted the developers of GNU Octave [2], an open
source clone of MATLAB. Its developers agreed that providing an
infix operator for matrix multiplication was important: numerical
programmers really do care whether they have to write "mmul(A,B)"
instead of "A op B".
On the other hand, when asked how important it was to have infix
operators for matrix solution and other operations, Prof. James
Rawlings replied [3]:
I DON'T think it's a must have, and I do a lot of matrix
inversion. I cannot remember if its A\b or b\A so I always
write inv(A)*b instead. I recommend dropping \.
Based on this discussion, and feedback from students at the US
national laboratories and elsewhere, we recommended adding only
one new operator, for matrix multiplication, to Python.
Iterators
The planned addition of iterators to Python 2.2 opens up a broader
scope for this proposal. As part of the discussion of PEP 201,
Lockstep Iteration[4], the author of this proposal conducted an
informal usability experiment[5]. The results showed that users
are psychologically receptive to "cross-product" loop syntax. For
example, most users expected:
S = [10, 20, 30]
T = [1, 2, 3]
for x in S; y in T:
print x+y,
to print "11 12 13 21 22 23 31 32 33". We believe that users will
have the same reaction to:
for (x, y) in S @ T:
print x+y
i.e. that they will naturally interpret this as a tidy way to
write loop nests.
This is where iterators come in. Actually constructing the
cross-product of two (or more) sequences before executing the loop
would be very expensive. On the other hand, "@" could be defined
to get its arguments' iterators, and then create an outer iterator
which returns tuples of the values returned by the inner
iterators.
Discussion
1. Adding a named function "across" would have less impact on
Python than a new infix operator. However, this would not make
Python more appealing to numerical programmers, who really do
care whether they can write matrix multiplication using an
operator, or whether they have to write it as a function call.
2. "@" would have be chainable in the same way as comparison
operators, i.e.:
(1, 2) @ (3, 4) @ (5, 6)
would have to return (1, 3, 5) ... (2, 4, 6), and *not*
((1, 3), 5) ... ((2, 4), 6). This should not require special
support from the parser, as the outer iterator created by the
first "@" could easily be taught how to combine itself with
ordinary iterators.
3. There would have to be some way to distinguish restartable
iterators from ones that couldn't be restarted. For example,
if S is an input stream (e.g. a file), and L is a list, then "S
@ L" is straightforward, but "L @ S" is not, since iteration
through the stream cannot be repeated. This could be treated
as an error, or by having the outer iterator detect
non-restartable inner iterators and cache their values.
4. Whiteboard testing of this proposal in front of three novice
Python users (all of them experienced programmers) indicates
that users will expect:
"ab" @ "cd"
to return four strings, not four tuples of pairs of
characters. Opinion was divided on what:
("a", "b") @ "cd"
ought to return...
Alternatives
1. Do nothing --- keep Python simple.
This is always the default choice.
2. Add a named function instead of an operator.
Python is not primarily a numerical language; it may not be worth
complexifying it for this special case. However, support for real
matrix multiplication *is* frequently requested, and the proposed
semantics for "@" for built-in sequence types would simplify
expression of a very common idiom (nested loops).
3. Introduce prefixed forms of all existing operators, such as
"~*" and "~+", as proposed in PEP 225 [1].
Our objections to this are that there isn't enough demand to
justify the additional complexity (see Rawlings' comments [3]),
and that the proposed syntax fails the "low toner" readability
test.
Acknowledgments
I am grateful to Huaiyu Zhu for initiating this discussion, and to
James Rawlings and students in various Python courses for their
discussions of what numerical programmers really care about.
References
[1] PEP 225, Elementwise/Objectwise Operators, Zhu, Lielens
http://www.python.org/peps/pep-0225.html
[2] http://bevo.che.wisc.edu/octave/
[3] http://www.egroups.com/message/python-numeric/4
[4] PEP 201, Lockstep Iteration, Warsaw
http://www.python.org/peps/pep-0201.html
[5] http://mail.python.org/pipermail/python-dev/2000-July/006427.html