An Introduction to Python

by Jeff Bauer
''And now, for something completely different.''
Python is an extensible, high level, interpreted, object oriented programming language. Ready for use in the real world, it's also free.
Linux Journal #21 Cover

Originally published in Linux Journal Issue #21

If you've been programming on a Linux system, you may be coding in C or C++. If you're a systems administrator, you may be programming in Perl, Tcl, Awk, or one of the various (sh/csh/bash) shell scripting languages. Maybe you wrote a script to do a particular job, but now find that it doesn't scale up very well. You might be writing C applications, but now wish you didn't have to be bogged down in the low level details. Or you may simply be intrigued by the possibility of doing high level, object oriented programming in a friendly, interpreted environment.

If any of the above applies to your situation, you may be interested in Python. Python is a powerful language for the rapid development of applications. The interpreter is easily extensible, and you may embed your favorite C code as a compiled extension module.

Python is not one of the research languages which seem to get promoted solely for pedagogical reasons. It is possible to do useful coding almost immediately. Python seems to encourage object oriented programming by clearing the paths, rather than erecting parapets.


Getting Started

To execute the standard hello program, enter the following at the command line:

$ python
Python 1.2 (Jun  3, 1995) [GCC 2.6.3]
Copyright 1991-1995 Sitchting Mathematisch Centrum, Amsterdam
>>>> print 'hello, bruce'
hello, bruce
>>>> [CONTROL]-D

Most Python programs, though developed incrementally, are executed as a normal script. The next program illustrates some extensions to the original. The new version will identify who you are, based on your user account in /etc/passwd.

 1  #!/usr/local/bin/python
 2
 3  import posix
 4  import string
 5
 6  uid = `posix.getuid()`
 7  passwd = open('/etc/passwd')
 8  for line in passwd.readlines():
 9      rec = string.splitfields(line, ':')
10      if rec[2] == uid:
11          print 'hello', rec[0],
12          print 'mind if we call you bruce?'
13          break
14      else:
15          print "I can't find you in /etc/passwd"

A line by line explanation of the program is as follows:

1 Command interpreter to invoke
3-4 Import two standard Python modules, posix and regsub
6 Get the user id using the posix module. The enclosing backticks (`) tell Python to assign this value as a string.
7 Open the /etc/passwd file in read mode.
8 Start a for loop, reading in all the lines of /etc/passwd. Compound statements, such as conditionals, have headers starting with a keyword if, while, for, try and end with a colon.
9 Each line in /etc/passwd is read and split into array rec[] based on a colon ':' boundary, using string.splitfields()
10 If rec[2] from /etc/passwd matches our call to posix.getuid() we have identified the user. The first 3 fields of /etc/passwd are: rec[0] = name, rec[1] = password, and rec[2] = uid
11-12 Print the user's account name to stdout. The trailing comma avoids the newline after the output.
13 Break the for loop.
14-15 Print message if we can't locate the user in /etc/passwd.

The observant reader will note that the control statements lack any form of BEGIN/END keywords or matching braces. This is because the indentation defines the way statements are grouped. Not only does this eliminate the need for braces, but it enforces a readable coding style. No doubt this design feature will turn off a few potential Python hackers, but in practice, it is useful. I can think of numerous times I've spent tracking bugs in C resulting from misinterpreting code that looked like any of these fragments, usually deeply nested:

if (n == 0)
    x++;
    y--;
z++;

if (m == n || (n != o && o == q)) { j++; }
    k++;
q = 0;

while (y--)
    *ptr++;
    if (m == n) {
        x++;
    }

A coding style enforced in the language definition would have saved me much frustration. Python code written by another programmer is usually very readable.


Libraries

You might object that we did a lot of work in the program above just to demonstrate Python language features. A better method would be to use the pwd module from the standard Python library:

  print 'hello', pwd.getpwuid(posix.getuid())[0]

This points out another nicety about Python that is critical for any new language's success: the robustness of its library. As mentioned earlier, you may extend Python by adding a compiled extension module to your personal library, but in most cases you don't have to.

Take the ftplib module for instance. If you wanted to write a Python script to automatically download the latest FAQ, you can simply use ftplib in the following example:

    #!/usr/local/bin/python

    from ftplib import FTP

    ftp = FTP('ftp.python.org')     # connect to host
    ftp.login()                     # login anonymous
    ftp.cwd('pub/python/doc')       # change directory
    ftp.retrlines('LIST')           # list python/doc
    F = open('python.FAQ', 'w')     # file: python.FAQ
    ftp.retrbinary('RETR FAQ', F.write, 1024)
    ftp.quit()

Python has numerous features which make programming fun and restore your perspective of the design objectives. The language encourages you to explore its features by writing experimental functions during program development. Several notable Python features:


Python Has Real Class

With the next example, I'll try to demonstrate some of these features. The StackingThings class will allow the user to stack items on top of each other until a breaking point is reached.

 1      #!/usr/local/bin/python
 2
 3      StackingException = 'StackingException'
 4
 5      class StackingThings:
 6      names = ('llama', 'spam', '16 ton weight', \
 7              'dead parrot')
 8      weights = {}
 9      weights['llama']         =     300
10      weights['spam']          =       1
11      weights['16 ton weight'] =   32000
12      weights['dead parrot']   =       2
13      breakpt = {}    # breaking points
14      breakpt['llama']         =     200
15      breakpt['spam']          =    1000
16      breakpt['16 ton weight'] = 1000000
17      breakpt['dead parrot']   =      15
18
19      def __init__(self):
20      self.items_stacked = []
21      def add(self, item):
22          if item not in self.names:
23              raise StackingException,
24                  item+'not a stackable object'
25              self.items_stacked.insert(0, item)
26          try:
27              self.test_strength(item)
28          except StackingException, val:
29              print item, val
30      def test_strength(self, item):
31          wt = 0
32          bp = 1000000
33          for i in self.items_stacked:
34              wt = wt + self.weights[i]<\n>
35              if wt > bp:
36                  self.items_stacked.remove(item)
37                  raise StackingException, \
38                      'exceeds breaking point!'
39              bp = self.breakpt[i]
40
41 # user code to test StackingThings class
42
43  s = StackingThings()
44
45  s.add'llama')
46  s.add('spam')
47  s.add('spam')
48  s.add('spam')
49  s.add('dead parrot')
50  s.add('16 ton weight')
51
52  print <'items stacked = ', s.items_stacked
53
54  try:
55      s.add('bad object')
56  except StackingException, msg:
57      print 'exception:', msg
This script produces the following output:

16 ton weight exceeds breaking point!
items stacked =  ['dead parrot', 'spam', 'spam',
        'spam', 'llama']
exception: bad object not a stackable object

The StackingThings class itself consists of 3 methods: __init__(), add(), and test_strength(). When initiating StackingThings, we use the special __init__ method to create its initial state by initializing the list of stacked items: items_stacked = []. The add() method is essentially the only method that is accessed by the user of StackingThings. And test_strength() is called by add() to verify that we have not exceeded our breaking point.

The first argument to each method in our example is called self. This is just a convention, but it makes our code much more readable. The first argument to a Python method is used in a somewhat similar fashion as the this keyword in C++.

Python provides for exception handling, both built-in (e.g. ZeroDivisionError, TypeError, NameError, etc.) and user-defined exceptions. The latter is especially useful in developing robust classes. Python uses the try/except syntax for exception handling:

    try:
        DenominateZero()
    except ZeroDivisionError, val:
        print 'Whoops:', val

Our add() method is used to try an exception in test_strength() and raise an exception when we pass it an illegal stacking item.

Two of the built-in methods for Python lists that are demonstrated in the example on lines 25 and 36 are insert() and remove(). Other supported operations on list objects include append(), count(), index(), reverse(), and sort().

The data attributes may be accessed by the methods of the class as well as the user code. Either print self.names within a class method or print s.names from the user code will print the list of legal stacking things:

['llama', 'spam', '16 ton weight', 'dead parrot']


Look It Up!

Dictionaries (associative arrays to all you Awk/Perl hackers) are one of the most useful Python data types. Unlike a normal array, which is indexed by number, associative arrays are indexed by strings. The value of this utility is worth describing in some detail.

I frequently deal with ICD-9-CM codes in medical applications. These codes are usually numeric, but sometimes alphanumeric. They usually have a decimal point, but sometimes don't. Some of the codes may be further subdivided into additional ICD-9 codes. Furthermore, codes are added and deleted periodically, but most don't change. Normally, the lookup of ICD-9 codes will be done in a relational database, but it is also convenient to use small data sets within an application. For example, given the dictionaries icd9 and subdivide:

x subdivide[x] icd9[x]
'692' 1 'Contact dermatitis'
'692.0' 0 'Due to detergents'
'692.2' 0 'Due to solvents'
'692.7' 1 'Due to solar radiation'
'692.70' 0 'Unspecified dermatitis'
'692.71' 0 'Sunburn'
'692.72' 0 'Other: Photodermatitis'

We can manipulate the ICD-9 codes in the following manner:

  for code in icd9.keys():
      if subdivide[code]:
          print 'ICD-9',code,'may be further subdivided'
      else:
          print 'Description for',code,'is:',icd9[code]

This would produce the following output:

    ICD-9 692.7 may be further subdivided
    Description for 692.70 is: Unspecified dermatitis
    Description for 692.0 is: Due to detergents
    ICD-9 692 may be further subdivided
    Description for 692.71 is: Sunburn
    Description for 692.2 is: Due to solvents
    Description for 692.72 is: Other: Photodermatitis

Lines 8-17 of our StackingThings example use dictionaries, but the initialization was broken into several lines for clarity. This could be reduced to:

  weights = {'llama':300, 'spam':1, '16 ton weight':32000, 'dead parrot':2}
  breakpt = {'llama':200, 'spam':1000, '16 ton weight':1000000, 'dead parrot':15}

Finally, inheritance is provided in Python, although it is not demonstrated in this example. The derived class may override methods of its base class or classes (yes, multiple inheritance is supported in a limited form). In C++ parlance, all methods in a Python class are ``virtual''.


Where do we go from here?

Python is currently available in source or as a Linux binary from ftp.python.org. Various modules have already been developed and become part of the standard Python Library. To mention just a few: support for strings, regular expressions, posix, sockets, threads, multimedia, cryptography, STDWIN, Internet/WWW, Expect, and a large number of other contributions, are submitted periodically.

Python is extensible. If you can program in C, you can add a new low-level module to the interpreter. We are currently doing this at our company for a distributed database system. The Python interpreter will be the high-level command language for many of the applications.

In addition to Linux, Python runs on several other platforms: OS/2, Windows, Macintosh, and many flavors of Unix. And like Linux, all of these versions are freely available and distributable.

The documentation for Python is of a very high quality, written by Guido van Rossum, the creator of Python. Four separate user manuals in postscript format are available at the Python ftp site (see sidebar ``Python Information''). These documents have also been converted to HTML and Microsoft help file formats. A Python FAQ, quick reference guide, and testimonials are also available. O'Reilly and Associates also intends to publish Programming Python early next year.

Python has its own active newsgroup (comp.lang.python) as well as a mailing list which receives the same messages as the newsgroup. To subscribe to the mailing list, send mail to python-list-request@cwi.nl. Various Python special interest groups have been formed: Matrix-SIG, GUI-SIG, and Locator-SIG.

Finally, the Python Software Activity (PSA) has been established to foster the common interests of the Python development community. The PSA, unlike the GNU Project, does not do the actual development of software (although many of its members probably do), but rather acts as a clearinghouse for Python software modules developed by others. It also hosts workshops and related activities to help promote the use of the Python language. Additional information about the PSA may be obtained by visiting the Python home page: http://www.python.org.

Special thanks to Mark Lutz, Aaron Watters, the PSA, and, of course Guido van Rossum.

Jeff Bauer has spent the past 16 years developing health care software. His current project involves interfacing pen-based computers with Unix systems to track clinical information.