Python vs. awk

Lou Kates (louk@RESEARCH.TELERIDE.ON.CA)
Mon, 7 Sep 1992 23:02:44 -0400

The following compares a Python program to one written in awk
(actually nawk). I realize that Guido feels that Python should
not really be compared to awk or perl since its aims are somewhat
different but I still feel that this sort of exercise is useful.

The two programs below exchange two fields preserving whitespace.
These fields can be optionally specified on the command line
using the easiest way that the two languages support (getopt
library for Python and the x=12 form for awk). The Python program
is 27 lines compared to 12 statements and 10 lines for the awk
program.

The following is the breakdown:

Python Awk
Declaration statements 1 stmt 0 stmts
Option processing 5 stmts 1 stmts
Argument processing 2 stmts 0 stmts
Reading & looping over input 3 stmts 0 stmts
Field splitting 1 stmt 0 stmts
Get list of whitespace fields 9 stmts 3 stmts
Check number of fields 1 stmt 1 stmt
Loop to swap & print fields 5 stmts 6 stmts (4 lines)
End of block stmts 0 stmts 1 stmt

Total 27 stmt 12 stmts (10 lines)

The areas of saving for awk are:

1. Awk saves 9 lines by implicitly processing the command line,
including options and arguments and reading and looping over
the input as well as doing field splitting.

2. Awk's gsub allows one to compactly obtain the list of
whitespace. In Python you have to do it by hand.

3. With Python you have to build up the output in a buffer and
in awk you can use printf to avoid this.

My conclusions are that Python needs some additions to the library
to:

1. facilitate command processing, option processing and looping
over input. An expanded version of the implicit loop program
that was posted a while ago to this list could help here.

2. perform sub and gsub functions. These could return a new
string with the replacement done since Python's strings
cannot be changed in place (immutable).

3. provide a way of printing out a partial line without having
a space placed at the end of it to reduce the need to build
up print buffers in the program.

=== awk program ===

BEGIN { if (x == 0) x = 1; if (y == 0) y = 2 }
NF >= x && NF >=y {
line = $0;
gsub(/[^ \t]+/, ":", line);
split(line, s, ":");
t = $y; $y = $x; $x = t;
for(i=1; i<=NF; i++)
printf("%s%s", s[i], $i);
print "";
}

=== Python program ===

import getopt, regexp, string, sys
pat = regexp.compile('[^ \t]+')
options, files = getopt.getopt(sys.argv[1:], 'x:y:')
x, y = 1, 2
for t in options:
if 'x' in t[0]: x = string.atoi(t[1])
if 'y' in t[0]: y = string.atoi(t[1])
fp = sys.stdin
if files: fp = sys.open(files[0])
while 1:
line = fp.readline()
if not line: break
fl = string.split(line)
wl, line = [], line[:-1]
while 1:
t = pat.exec(line)
if t:
wl.append( line[ :t[0][0] ])
line = line[ t[0][1]: ]
else:
break
if len(fl) > x and len(fl) > y:
fl[x], fl[y] = fl[y], fl[x]
buf = ''
for i in range(0, len(fl)):
buf = buf + wl[i] + fl[i]
print buf

-- 
Lou Kates, louk@research.teleride.on.ca