Python for Critical Applications, a Case Study

Eric Newton
MetaSlash, Inc.

Abstract

Since Python is a dynamic scripting language, it is often targeted for light-weight programming tasks. Nevertheless, I have found it to be an excellent tool for building mission-critical applications. Python is portable, has excellent support libraries, and simplifies many programming tasks. This paper details my experiences using Python and C++ when implementing Recall, a framework for building distributed, replicated, fault-tolerant servers.

Keywords:
replication, storage, fault-tolerant, CORBA, distributed objects

Introduction

In the 1990s, I wrote distributed, high-availability systems for air traffic control and airline operations. These systems were implemented as redundant pairs of workstations. The secondary system would automatically provide service when the primary system failed. However, the backup system provided service without any data from the primary system. Eventually this approach was too simplistic because the secondary had to have up-to-date information from the primary. In addition, this state had to be consistent. All updates had to be performed in the proper order among all redundant services.

Background

The problem of replicated, fault-tolerant data storage interested me, so I began building a service which would provide identical and up-to-date storage on physically distributed machines. This type of service is important when you consider a banking transaction. If you deposit one million dollars, that information ought to be stored in a fault-tolerant fashion and in a consistent way. After the bank tells you that your money has been safely recorded, it can not forget or duplicate the deposit even if there is a fail-over from the primary to a backup immediately after your transaction. The bank's computer storage should be consistent and fault-tolerant. In general, such a fault-tolerant system should be able to provide service, as long as a majority of the servers are available.

I began implementing a framework which would take care of all the details associated with making consistent updates to data storage across machines. The algorithm elects a master system from the available servers (called Replicas) which broadcasts the data updates to all servers in a well-defined order. My first prototype was written using Python and within a month it was capable of consistently replicating data and performing fail-over. This prototype was the beginnings of the Recall framework which I made available as open source.

Leaving Python

All of my customers' previous systems were written in C++, so I decided that Recall should also be written in C++ for use on those projects. I rewrote the Python prototype in C++ using the Adaptive Communications Environment (ACE) for portability. Internal data structures were implemented using the Standard Template Library (STL). Over the next year, performance improved and several types of services were implemented. The first public release was in October, 1999.

During the following year, I made changes to accommodate a broader range of user requests. In particular, many took issue with the dependencies I chose to use. Some users found that ACE was too large a dependency or was redundant with libraries they were already using. In response, I created my own simple communications classes dependent only on basic POSIX threads (pthreads) and basic socket calls. The pthreads interface was not available on Win32, so an alternative implementation using the GNU Pth library was added. Some users objected to the use of the STL since it is not portable across compilers, especially with regard to template instantiation. So I replaced the standard classes with simple substitutes for basic utilities like strings and dynamic arrays.

By the end of 2000, it was clear that supporting the least-common-denominator C++ environment was a chore. The complexities of C++ without library support was making my project less fun. Finally, I found a number of latent bugs and limitations in the custom thread and communications libraries I wrote. A rewrite was necessary.

Python, Again

I was pleased with the way ACE insulated Recall from platform dependencies. Unfortunately, not all users were willing to use the this library. I began thinking that CORBA might be the best way to fix my portability and environment issues, because CORBA offers distributed communications support, independent of a particular library. Different ORBs are optimized for speed, size and latency, so users could choose the ORB appropriate for their needs.

There were other benefits of CORBA. With the CORBA Interface Definition Language (IDL), I could define a language neutral API. A well defined API is important for users of Recall, since it is a framework and designed to be extended and specialized. By using IDL, the interfaces can be implemented and called from a variety of languages and environments.

After deciding to use CORBA, I began prototyping some clients and servers using Python. The intent was to learn CORBA basics before moving back to C++ and re-implementing the Recall library. Once a few of the Python programs were running, I began rewriting them in C++. Suddenly, my little Python prototypes grew much larger and more complex simply by translating them.

With CORBA, each programming language has its own standard for mapping objects into programming constructs for that language. For example, there is no object or class construct in C, but there is a way to construct CORBA objects and call methods using the standard C mapping. One can reasonably assume that the C mapping is more cumbersome than the C++ mapping because C++ already has a way of representing objects and methods.

The Python mapping was much less cumbersome than the C++ mapping. Python supports generic object references and garbage collection directly, but the C++ mapping requires the use of generated "_var" classes to provide object references. Python has built-in support for sequences, but C++ requires a new class. Python is reflexive since its objects may be inspected at run-time. It maps very cleanly to the inspectable data type Any under CORBA. C++ requires additional classes and type coercions because it does not have built-in reflection. Beyond the complexity of the language mapping, I still had some of the old C++ integration issues; for example, should I use the STL or not?

Laziness won me over and I began re-writing Recall in Python using omniORB. Once again, I was pleased with the productivity of a Python-based solution. All of Recall was re-written by a single person in about 80 hours. It took 3 additional days to port the Python version of Recall to Java using Jython.

Recall was platform independent before the switch to Python, but only after much effort. By using CORBA and Python, most of the platform dependent code lies within those run-time environments. And although Recall is implemented in Python, users have the ability to choose their own language when integrating with Recall.

I would have been pleased with the new version of Recall, even if it did not result in a reduction of code. But there was a dramatic decrease in the size of Recall after the rewrite. The table below gives the lines of code which implement Recall in C++ and Python:

  Recall Support Examples
C++ 4988 3105 2573
Python and IDL 1858 0 659

I have separated the code for sockets, threads, strings, lists, etc. because these are found in many C++ libraries, although they are non-standard or may conflict with users' requirements. The count for this code is in the Support column.

Why is the core of Recall/C++ over two times larger than Recall/Python? One might be tempted to attribute the size of Recall/Python to the use of CORBA middleware. CORBA saves programmer effort by generating classes to marshal and unmarshal values. There are about 500 lines of Recall/C++ code associated with encoding and decoding data. But Recall/Python includes 325 lines of IDL, so the savings are actually very small. A better explanation of why Recall/Python is smaller than Recall/C++ is C++ header files. These files contain declarations which are not used in Python and represent about 40% of the difference between the two implementations.

The reasons for rewriting Recall were not based on the inadequacy of C++ or performance, but rather to fix bugs and improve integration. The C++ version of Recall was a good-faith effort from an experienced programmer who continues to use C++ on a daily basis. Below is a section of code taken from both versions of Recall which is meant to show how Python creates smaller, simpler programs.

The code sample implements the reconciliation algorithm used in Recall. Reconciliation is the final step of electing a master. Before a master server can begin broadcasting updates, it ensures that the available servers have all the updates that were in progress when the election was called. Here is the Python version:

    def reconcile(self, upToDate):

        # calculate push set
        all = {}
        for r in upToDate:
            for inProgress in r.snapshot.nInProgress:
                all[inProgress.serverSequenceId] = inProgress
        push = all.keys()
        push.sort()
        push = push[-Recall.NInProgress:]
        
        # send any missing values
        for p in push:
            for r in upToDate:
                updates = map(lambda x: x.serverSequenceId,
		              r.snapshot.nInProgress)
                if p not in updates:
                    r.redo(all[p])

For comparison, this is the C++ version:

    typedef struct {
	bool operator()(const Update* a, const Update* b) const {
		return a->id() < b->id(); 
	}
    } Compare;

    static bool find(Vector<const Update *> & updates, long id) {

	Vector<const Update *>::iterator j = updates.begin();
	for (; j != updates.end();  j++)
	    if (id == (*j)->id())
		return true;

	return false;
    }

    class CleanupUpdates {
	Vector<Updates *> & updates_;

    public:
	CleanupUpdates(Vector<Updates *> & updates) : updates_(updates) {}
	~CleanupUpdates() {
	    for (Vector<Updates *>::iterator i = updates_.begin(); 
		 i != updates_.end(); 
		 i++)
		delete *i;
	}
    };

    int ForwardOnlyStrategy::execute(Echo* server, Replicas& upToDate)  {

	Vector<const Update *> push;    // push set, calculated below
	Vector<Updates *> all;	        // all in-progress lists
	CleanupUpdates cleanup(all);
	Vector<Replica *> replicas;
	// Decode all in-progress lists
	// ... lines removed... 

	// Calculate the total update set
	for (Vector<Updates *>::iterator i = all.begin(); i != all.end(); i++)
	    for (size_t op = 0; op < (*i)->size(); op++)
		if (! ::find(push, (*i)->index(op)->id()))
		    push.append((*i)->index(op));

	// Sort
	::sort(push, ::Compare());

	// Take the last NInProgress
	while (push.size() > server->options()->nInProgress)
	    push.erase(push.begin());

	for (Vector<const Update *>::iterator i = push.begin(); i != push.end(); i++) {
	    for (size_t j = 0; j < replicas.size(); j++)  {
		// Don't send if the replica already has this update
		if (! all[j]->hasId( (*i)->id() )) {
		    // Send update to client
		    // ... lines removed ...

		}
	    }
	}

	return 0;
    }

A total of fourteen lines of code which send and receive messages have been removed as indicated by comments, since they would be replaced by CORBA middleware. Still, the C++ version is clearly longer and more complex.

Let's examine the code one step at a time and compare the essential differences. The first part of the function computes the total list of updates last performed on each server. For example, if the last operations on server A were {1, 2, 3} and on server B were {2, 3, 4}, the total set of operations would be {1, 2, 3, 4}.

        # calculate push set
        all = {}
        for r in upToDate:
            for inProgress in r.snapshot.nInProgress:
                all[inProgress.serverSequenceId] = inProgress

The corresponding C++ code:

	class CleanupUpdates {
	    Vector<Updates *> & updates_;

	public:
	    CleanupUpdates(Vector<Updates *> & updates) : updates_(updates) {}
	    ~CleanupUpdates() {
		for (Vector<Updates *>::iterator i = updates_.begin(); 
		     i != updates_.end(); 
		     i++)
		    delete *i;
	    }
	};

	static bool find(Vector<const Update *> & updates, long id) {

	    Vector<const Update *>::iterator j = updates.begin();
	    for (; j != updates.end();  j++)
		if (id == (*j)->id())
		    return true;

	    return false;
	}

	Vector<const Update *> push;    // push set, calculated below
	Vector<Updates *> all;	        // all in-progress lists
	CleanupUpdates cleanup(all);

	// Decode all in-progress lists
	// ... lines removed... 

	// Calculate the total update set
	for (Vector<Updates *>::iterator i = all.begin(); i != all.end(); i++)
	    for (size_t op = 0; op < (*i)->size(); op++)
		if (! ::find(push, (*i)->index(op)->id()))
		    push.append((*i)->index(op));

The C++ version uses a different data structure to compute sets. First, it only uses a dynamic array data structure (Vector) and not the dictionary type used in Python. To avoid unwanted dependencies, a map or set implementation would have to be provided by me. Given the Vector class was available and used extensively elsewhere in Recall, I simply used it here, even though it was less than optimal. A simple helper class, CleanupUpdates ensures that objects allocated as a result of unmarshaling are deleted, even in the presence of an exception. The find() function was necessary to simplify the inner loop.

This section above could be made much smaller using more advanced data structures. In particular, the STL provides a find() function and a useful set implementation. But at the very least, the STL iterators are more complex to read and use.

The need to address memory deallocation adds complexity throughout the implementation. If this code used the C++ CORBA bindings, the Update objects could be deallocated with the "_var" classes. These classes add their own complexity because there is often confusion when they should be used.

The next block of code computes the last N updates, for some constant N. If the total set of updates is {1, 2, 3, 4 } and N = 3, we want to compute {2, 3, 4 }. In Python, we compute all the ids from the mapping, sort them, and slice down to the last N values:

        push = all.keys()
        push.sort()
        push = push[-Recall.NInProgress:]

and the corresponding C++:

	typedef struct {
	    bool operator()(const Update* a, const Update* b) const { 
		    return a->id() < b->id(); 
		}
	} Compare;

	// Sort
	::sort(push, ::Compare());

	// Take the last NInProgress
	while (push.size() > server->options()->nInProgress)
	    push.erase(push.begin());

The sort() function takes a comparison object which allows the updates to be sorted by id. The while loop removes any extra updates.

In Recall/Python, the class representing an Update has a comparison function which allows it to be sorted. This function is only two lines, one for the definition, the other to return a value. The C++ version needs a separate comparison function because the container holds pointers, not objects or references.

The next block sends any missing updates to other servers:

        # send any missing values
        for p in push:
            for r in upToDate:
                updates = map(lambda x: x.serverSequenceId,
		              r.snapshot.nInProgress)
                if p not in updates:
                    r.redo(all[p])

The corresponding C++ code:

	for (Vector<const Update *>::iterator i = push.begin(); i != push.end(); i++) {
	    for (size_t j = 0; j < replicas.size(); j++) {
		// Don't send if the replica already has this update
		if (! all[j]->hasId( (*i)->id() )) {

		    // Send update to client
		    // ... lines removed ...
		}
	    }
	}

The Python loops are simpler, especially when compared to the STL style of iteration. Note that both code samples hide a third loop. In Python, the map() function is used to compute the updates known to a server. In the C++ version, the Replica object can search for the update with the call to hasId().

I have not compared the C++ and Python versions for efficiency. This part of the replication algorithm is only run during elections, which only occur when a server fails. The loops are bounded by the number of replicas (probably less than 10) and the number of updates that can be broadcast in parallel (almost certainly less than 50). In this last bit of Python code, the set of update ids computed by the map() function could be hoisted outside the inner loop, but I have not bothered to do so. I expect the loop to be executed less than ten times for normal configurations.

Although it would appear that the C++ version is always more complex and more verbose than Python, there are times when C++ is shorter and more convenient. A useful idiom for threaded applications is a mutex variable that protects data from simultaneous updates. In Python, the code looks something like this:

	  class Object:

	    def critical_function(self):
		self._mutex.acquire()
		try:
		    # Manipulate data shared by threads
		finally:
		    self._mutex.release()

The finally clause is used to release the lock, in case the critical code should throw some sort of exception. Java requires the same technique.

In C++, a wrapper is used around the variable to automatically lock and unlock the mutex using the constructor and destructor. To mutually exclude other threads, the code typically looks like this:

       void Object::critical_function()  {
	    Lock lock(&this->mutex_);

	    // Manipulate data shared by threads
       }

In C++, locally scoped objects are guaranteed to be destroyed at the end of the enclosing block, even in the presence of exceptions. The Lock class looks something like this:

       class Lock { 
           MutexVariable * mutex_;
	public:
           Lock(MutexVariable *mutex) : mutex_(mutex) { mutex_->acquire(); }
	   ~Lock() { mutex_->release(); }
       };

Obtaining locks is simpler in C++ than in Python. The equivalent Python code is always 3 lines longer. In fact, about 9% of the Python implementation of Recall is simply obtaining and releasing locks safely. Yet, overall, the Python code is smaller.

Python does not provide the static type-checking available in C++. Recall was one of the earliest users of pychecker which catches many of the same types of errors caught by a compiler. The use of IDL to define interfaces, along with the run-time checks provided by the ORB, validates many object types. Still, one type error was found when the Python version of Recall was ported to Java using Jython, due to the increased type conformance checking performed by the Java runtime.

A surprising benefit of the Python rewrite is how it has simplified small examples. It is now possible to write simple fault-tolerant services in 50 lines using the standard Python library. Before the rewrite, even simple C++ implementations were three times longer, primarily because the basic library support was more cumbersome.

Recall is used to build persistent data servers, and most of the examples read and write data to disk. Python provides the pickle module to serialize objects to files, which is convenient for trivial data storage. In C++, the objects must be marshaled to and from the file. The extra code detracts from the examples, making them more complex and difficult to understand.

Conclusions and Future Work

Based on my experiences with C++, I would highly recommend using Python for complex distributed applications. I would also recommend using an ORB because it supports choices for operating system and implementation language.

The Python version of Recall has not yet been optimized for performance. The air traffic control applications which precipitated the need for Recall only needed to handle a dozen updates per second. This rate is easily handled by the current version of Recall/Python. The network latencies and persistent storage requirements of most Recall services are typically greater than the differences in language implementations. Also, all of the low-level manipulations, such as data marshaling, are done by native (or Java) routines within the ORB and tend to be optimized.

This paper certainly does not prove that Python is better than any other language, or that it always results in simpler or shorter code. By combining powerful built-in data structures and natural language bindings for CORBA, Python deserves serious consideration for mission-critical applications.

References

ACE
http://www.cs.wustl.edu/~schmidt/ACE.html
CORBA
Common Object Request Broker Architecture
omniORB
http://www.uk.research.att.com/omniORB/index.html
GNU Pth
http://www.gnu.org/software/pth
Pthreads (POSIX threads)
IEEE POSIX 1003.1c-1995
Recall
http://www.fault-tolerant.org/recall
Recall/Python Sources
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/recall/corba
Recall/C++ Sources
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/recall/recall
SRC Research Report 46
http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/abstracts/src-rr-046.html
STL
http://www.stlport.org
PyChecker
http://pychecker.sf.net

Eric C. Newton
Last modified: Mon Dec 17 09:04:48 EST 2001