The Twisted Network Framework

Moshe Zadka m@moshez.org
Glyph Lefkowitz glyph@twistedmatrix.com

Abstract

Twisted is a framework for writing asynchronous, event-driven networked programs in Python -- both clients and servers. In addition to abstractions for low-level system calls like select(2) and socket(2), it also includes a large number of utility functions and classes, which make writing new servers easy. Twisted includes support for popular network protocols like HTTP and SMTP, support for GUI frameworks like GTK+/GNOME and Tk and many other classes designed to make network programs easy. Whenever possible, Twisted uses Python's introspection facilities to save the client programmer as much work as possible. Even though Twisted is still work in progress, it is already usable for production systems -- it can be used to bring up a Web server, a mail server or an IRC server in a matter of minutes, and require almost no configuration.

Keywords: internet, network, framework, event-based, asynchronous

Introduction

Python lends itself to writing frameworks. Python has a simple class model, which facilitates inheritance. It has dynamic typing, which means code needs to assume less. Python also has built-in memory management, which means application code does not need to track ownership. Thus, when writing a new application, a programmer often finds himself writing a framework to make writing this kind of application easier. Twisted evolved from the need to write high-performance interoperable servers in Python, and making them easy to use (and difficult to use incorrectly).

There are three ways to write network programs:

  1. Handle each connection in a separate process
  2. Handle each connection in a separate thread
  3. Use non-blocking system calls to handle all connections in one thread.

When dealing with many connections in one thread, the scheduling is the responsibility of the application, not the operating system, and is usually implemented by calling a registered function when each connection is ready to for reading or writing -- commonly known as event-driven, or callback-based, programming.

Since multi-threaded programming is often tricky, even with high level abstractions, and since forking Python processes has many disadvantages, like Python's reference counting not playing well with copy-on-write and problems with shared state, it was felt the best option was an event-driven framework. A benefit of such approach is that by letting other event-driven frameworks take over the main loop, server and client code are essentially the same - making peer-to-peer a reality. While Twisted includes its own event loop, Twisted can already interoperate with GTK+'s and Tk's mainloops, as well as provide an emulation of event-based I/O for Jython (specific support for the Swing toolkit is planned). Client code is never aware of the loop it is running under, as long as it is using Twisted's interface for registering for interesting events.

Some examples of programs which were written using the Twisted framework are twisted.web (a web server), twisted.mail (a mail server, supporting both SMTP and POP3, as well as relaying), twisted.words (a chat application supporting integration between a variety of IM protocols, like IRC, AOL Instant Messenger's TOC and Perspective Broker, a remote-object protocol native to Twisted), im (an instant messenger which connects to twisted.words) and faucet (a GUI client for the twisted.reality interactive-fiction framework). Twisted can be useful for any network or GUI application written in Python.

However, event-driven programming still contains some tricky aspects. As each callback must be finished as soon as possible, it is not possible to keep persistent state in function-local variables. In addition, some programming techniques, such as recursion, are impossible to use. Event-driven programming has a reputation of being hard to use due to the frequent need to write state machines. Twisted was built with the assumption that with the right library, event-driven programming is easier then multi-threaded programming. Twisted aims to be that library.

Twisted includes both high-level and low-level support for protocols. Most protocol implementation by twisted are in a package which tries to implement "mechanisms, not policy". On top of those implementations, Twisted includes usable implementations of those protocols: for example, connecting the abstract HTTP protocol handler to a concrete resource-tree, or connecting the abstract mail protocol handler to deliver mail to maildirs according to domains. Twisted tries to come with as much functionality as possible out of the box, while not constraining a programmer to a choice between using a possibly-inappropriate class and rewriting the non-interesting parts himself.

Twisted also includes Perspective Broker, a simple remote-object framework, which allows Twisted servers to be divided into separate processes as the end deployer (rather then the original programmer) finds most convenient. This allows, for example, Twisted web servers to pass requests for specific URLs with co-operating servers so permissions are granted according to the need of the specific application, instead of being forced into giving all the applications all permissions. The co-operation is truly symmetrical, although typical deployments (such as the one which the Twisted web site itself uses) use a master/slave relationship.

Twisted is not alone in the niche of a Python network framework. One of the better known frameworks is Medusa. Medusa is used, among other things, as Zope's native server serving HTTP, FTP and other protocols. However, Medusa is no longer under active development, and the Twisted development team had a number of goals which would necessitate a rewrite of large portions of Medusa. Twisted seperates protocols from the underlying transport layer. This seperation has the advantages of resuability (for example, using the same clients and servers over SSL) and testability (because it is easy to test the protocol with a much lighter test harness) among others. Twisted also has a very flexible main-loop which can interoperate with third-party main-loops, making it usable in GUI programs too.

Complementing Python

Python comes out of the box with "batteries included". However, it seems that many Python projects rewrite some basic parts: logging to files, parsing options and high level interfaces to reflection. When the Twisted project found itself rewriting those, it moved them into a separate subpackage, which does not depend on the rest of the twisted framework. Hopefully, people will use twisted.python more and solve interesting problems instead. Indeed, it is one of Twisted's goals to serve as a repository for useful Python code.

One useful module is twisted.python.reflect, which has methods like prefixedMethods, which returns all methods with a specific prefix. Even though some modules in Python itself implement such functionality (notably, urllib2), they do not expose it as a function usable by outside code. Another useful module is twisted.python.hook, which can add pre-hooks and post-hooks to methods in classes.

# Add all method names beginning with opt_ to the given
# dictionary. This cannot be done with dir(), since
# it does not search in superclasses
dct = {}
reflect.addMethodNamesToDict(self.__class__, dct, "opt_")

# Sum up all lists, in the given class and superclasses,
# which have a given name. This gives us "different class
# semantics": attributes do not override, but rather append
flags = []
reflect.accumulateClassList(self.__class__, 'optFlags', flags)

# Add lock-acquire and lock-release to all methods which
# are not multi-thread safe
for methodName in klass.synchronized:
    hook.addPre(klass, methodName, _synchPre)
    hook.addPost(klass, methodName, _synchPost)

Listing 1: Using twisted.python.reflect and twisted.python.hook

The twisted.python subpackage also contains a high-level interface to getopt which supplies as much power as plain getopt while avoiding long if/elif chains and making many common cases easier to use. It uses the reflection interfaces in twisted.python.reflect to find which options the class is interested in, and constructs the argument to getopt. Since in the common case options' values are just saved in instance attributes, it is very easy to indicate interest in such options. However, for the cases custom code needs to be run for an option (for example, counting how many -v options were given to indicate verbosity level), it will call a method which is named correctly.

class ServerOptions(usage.Options):
    # Those are (short and long) options which
    # have no argument. The corresponding attribute
    # will be true iff this option was given
    optFlags = [['nodaemon','n'],
                ['profile','p'],
                ['threaded','t'],
                ['quiet','q'],
                ['no_save','o']]
    # This are options which require an argument
    # The default is used if no such option was given
    # Note: since options can only have string arguments,
    # putting a non-string here is a reliable way to detect
    # whether the option was given
    optStrings = [['logfile','l',None],
                  ['file','f','twistd.tap'],
                  ['python','y',''],
                  ['pidfile','','twistd.pid'],
                  ['rundir','d','.']]

    # For methods which can be called multiple times
    # or have other unusual semantics, a method will be called
    # Twisted assumes that the option needs an argument if and only if
    # the method is defined to accept an argument.
    def opt_plugin(self, pkgname):
        pkg = __import__(pkgname)
        self.python = os.path.join(os.path.dirname(
                         os.path.abspath(pkg.__file__)), 'config.tac')

    # Most long options based on methods are aliased to short
    # options. If there is only one letter, Twisted knows it is a short
    # option, so it is "-g", not "--g"
    opt_g = opt_plugin

try:
    config = ServerOptions()
    config.parseOptions()
except usage.error, ue:
    print "%s: %s" % (sys.argv[0], ue)
    sys.exit(1)
Listing 2: twistd's Usage Code

Unlike getopt, Twisted has a useful abstraction for the non-option arguments: they are passed as arguments to the parsedArgs method. This means too many arguments, or too few, will cause a usage error, which will be flagged. If an unknown number of arguments is desired, explicitly using a tuple catch-all argument will work.

Configuration

The formats of configuration files have shown two visible trends over the years. On the one hand, more and more programmability has been added, until sometimes they become a new language. The extreme end of this trend is using a regular programming language, such as Python, as the configuration language. On the other hand, some configuration files became more and more machine editable, until they become a miniature database formates. The extreme end of that trend is using a generic database tool.

Both trends stem from the same rationale -- the need to use a powerful general purpose tool instead of hacking domain specific languages. Domain specific languages are usually ad-hoc and not well designed, having neither the power of general purpose languages nor the predictable machine editable format of generic databases.

Twisted combines these two trends. It can read the configuration either from a Python file, or from a pickled file. To some degree, it integrates the approaches by auto-pickling state on shutdown, so the configuration files can migrate from Python into pickles. Currently, there is no way to go back from pickles to equivalent Python source, although it is planned for the future. As a proof of concept, the RPG framework Twisted Reality already has facilities for creating Python source which evaluates into a given Python object.

from twisted.internet import main
from twisted.web import proxy, server
site = server.Site(proxy.ReverseProxyResource('www.yahoo.com', 80, '/'))
application = main.Application('web-proxy')
application.listenOn(8080, site)
Listing 3: The configuration file for a reverse web proxy

Twisted's main program, twistd, can receive either a pickled twisted.internet.main.Application or a Python file which defines a variable called application. The application can be saved at any time by calling its save method, which can take an optional argument to save to a different file name. It would be fairly easy, for example, to have a Twisted server which saves the application every few seconds to a file whose name depends on the time. Usually, however, one settles for the default behavior which saves to a shutdown file. Then, if the shutdown configuration proves suitable, the regular pickle is replaced by the shutdown file. Hence, on the fly configuration changes, regardless of complexity, can always persist.

There are several client/server protocols which let a suitably privileged user to access to application variable and change it on the fly. The first, and least common denominator, is telnet. The administrator can telnet into twisted, and issue Python statements to her heart's content. For example, one can add ports to listen on to the application, reconfigure the web servers and various other ways by simple accessing __main__.application. Some proof of concepts for a simple suite of command-line utilities to control a Twisted application were written, including commands which allow an administrator to shut down the server or save the current state to a tap file. These are especially useful on Microsoft Windows(tm) platforms, where the normal UNIX way of communicating shutdown requests via signals are less reliable.

If reconfiguration on the fly is not necessary, Python itself can be used as the configuration editor. Loading the application is as simple as unpickling it, and saving it is done by calling its save method. It is quite easy to add more services or change existing ones from the Python interactive mode.

A more sophisticated way to reconfigure the application on the fly is via the manhole service. Manhole is a client/server protocol based on top of Perspective Broker, Twisted's translucent remote-object protocol which will be covered later. Manhole has a graphical client called gtkmanhole which can access the server and change its state. Since Twisted is modular, it is possible to write more services for user friendly configuration. For example, through-the-web configuration is planned for several services, notably mail.

For cases where a third party wants to distribute both the code for a server and a ready to run configuration file, there is the plugin configuration. Philosophically similar to the --python option to twistd, it simplifies the distribution process. A plugin is an archive which is ready to be unpacked into the Python module path. In order to keep a clean tree, twistd extends the module path with some Twisted-specific paths, like the directory TwistedPlugins in the user's home directory. When a plugin is unpacked, it should be a Python package which includes, alongside __init__.py a file named config.tac. This file should define a variable named application, in a similar way to files loaded with --python. The plugin way of distributing configurations is meant to reduce the temptation to put large amount of codes inside the configuration file itself.

Putting class and function definition inside the configuration files would make the persistent servers which are auto-generated on shutdown useless, since they would not have access to the classes and functions defined inside the configuration file. Thus, the plugin method is intended so classes and functions can still be in regular, importable, Python modules, but still allow third parties distribute powerful configurations. Plugins are used by some of the Twisted Reality virtual worlds.

Ports, Protocol and Protocol Factories

Port is the Twisted class which represents a socket listening on a port. Currently, twisted supports both internet and unix-domain sockets, and there are SSL classes with identical interface. A Port is only responsible for handling the transfer layer. It calls accept on the socket, checks that it actually wants to deal with the connection and asks its factory for a protocol. The factory is usually a subclass of twisted.protocols.protocol.Factory, and its most important method is buildProtocol. This should return something that adheres to the protocol interface, and is usually a subclass of twisted.protocols.protocol.Protocol.

from twisted.protocols import protocol
from twisted.internet import main, tcp

class Echo(protocol.Protocol):
    def dataReceived(self, data):
        self.transport.write(data)

factory = protocol.Factory()
factory.protocol = Echo
port = tcp.Port(8000, factory)
app = main.Application("echo")
app.addPort(port)
app.run()
Listing 4: A Simple Twisted Application

The factory is responsible for two tasks: creating new protocols, and keeping global configuration and state. Since the factory builds the new protocols, it usually makes sure the protocols have a reference to it. This allows protocols to access, and change, the configuration. Keeping state information in the factory is the primary reason for keeping an abstraction layer between ports and protocols. Examples of configuration information is the root directory of a web server or the user database of a telnet server. Note that it is possible to use the same factory in two different Ports. This can be used to run the same server bound to several different addresses but not to all of them, or to run the same server on a TCP socket and a UNIX domain sockets.

A protocol begins and ends its life with connectionMade and connectionLost; both are called with no arguments. connectionMade is called when a connection is first established. By then, the protocol has a transport attribute. The transport attribute is a Transport - it supports write and loseConnection. Both these methods never block: write actually buffers data which will be written only when the transport is signalled ready to for writing, and loseConnection marks the transport for closing as soon as there is no buffered data. Note that transports do not have a read method: data arrives when it arrives, and the protocol must be ready for its dataReceived method, or its connectionLost method, to be called. The transport also supports a getPeer method, which returns parameters about the other side of the transport. For TCP sockets, this includes the remote IP and port.

# A tcp port-forwarder
# A StupidProtocol sends all data it gets to its peer.
# A StupidProtocolServer connects to the host/port,
# and initializes the client connection to be its peer
# and itself to be the client's peer
from twisted.protocols import protocol

class StupidProtocol(protocol.Protocol):
    def connectionLost(self): self.peer.loseConnection();del self.peer
    def dataReceived(self, data): self.peer.write(data)

class StupidProtocolServer(StupidProtocol):
    def connectionMade(self):
        clientProtocol = StupidProtocol()
        clientProtocol.peer = self.transport
        self.peer = tcp.Client(self.factory.host, self.factory.port, 
                               clientProtocol)

# Create a factory which creates StupidProtocolServers, and
# has the configuration information they assume
def makeStupidFactory(host, port):
    factory = protocol.Factory()
    factory.host, factory.port = host, port
    factory.protocol = StupidProtocolServer
    return factory
Listing 5: TCP forwarder code

The Event Loop

While Twisted has the ability to let other event loops take over for integration with GUI toolkits, it usually uses its own event loop. The event loop code uses global variables to maintain interested readers and writers, and uses Python's select() function, which can accept any object which has a fileno() method, not only raw file descriptors. Objects can use the event loop interface to indicate interest in either reading to or writing from a given file descriptor. In addition, for those cases where time-based events are needed (for example, queue flushing or periodic POP3 downloads), Twisted has a mechanism for repeating events at known delays. While far from being real-time, this is enough for most programs' needs.

Going Higher Level

Unfortunately, handling arbitrary data chunks is a hard way to code a server. This is why twisted has many classes sitting in submodules of the twisted.protocols package which give higher level interface to the data. For line oriented protocols, LineReceiver translates the low-level dataReceived events into lineReceived events. However, the first naive implementation of LineReceiver proved to be too simple. Protocols like HTTP/1.1 or Freenet have packets which begin with header lines that include length information, and then byte streams. LineReceiver was rewritten to have a simple interface for switching at the protocol layer between line-oriented parts and byte-stream parts.

Another format which is gathering popularity is Dan J. Bernstein's netstring format. This format keeps ASCII text as ASCII, but allows arbitrary bytes (including nulls and newlines) to be passed freely. However, netstrings were never designed to be used in event-based protocols where over-reading is unavoidable. Twisted makes sure no user will have to deal with the subtle problems handling netstrings in event-driven programs by providing NetstringReceiver.

For even higher levels, there are the protocol-specific protocol classes. These translate low-level chunks into high-level events such as "HTTP request received" (for web servers), "approve destination address" (for mail servers) or "get user information" (for finger servers). Many RFCs have been thus implemented for Twisted (at latest count, more then 12 RFCs have been implemented). One of Twisted's goals is to be a repository of event-driven implementations for various protocols in Python.

class DomainSMTP(SMTP):

    def validateTo(self, helo, destination):
        try:
            user, domain = string.split(destination, '@', 1)
        except ValueError:
            return 0
        if not self.factory.domains.has_key(domain): 
            return 0
        if not self.factory.domains[domain].exists(user, domain, self): 
            return 0
        return 1

    def handleMessage(self, helo, origin, recipients, message):
        # No need to check for existence -- only recipients which
        # we approved at the validateTo stage are passed here
        for recipient in recipients:
            user, domain = string.split(recipient, '@', 1)
            self.factory.domains[domain].saveMessage(origin, user, message,
                                                     domain)
Listing 6: Implementation of virtual domains using the SMTP protocol class

Copious documentation on writing new protocol abstraction exists, since this is the largest amount of code written -- much like most operating system code is device drivers. Since many different protocols have already been implemented, there are also plenty of examples to draw on. Usually implementing the client-side of a protocol is particularly challenging, since protocol designers tend to assume much more state kept on the client side of a connection then on the server side.

The twisted.tap Package and mktap

Since one of Twisted's configuration formats are pickles, which are tricky to edit by hand, Twisted evolved a framework for creating such pickles. This framework is contained in the twisted.tap package and the mktap script. New servers, or new ways to configure existing servers, can easily participate in the twisted.tap framework by creating a twisted.tap submodule.

All twisted.tap submodules must conform to a rigid interface. The interface defines functions to accept the command line parameters, and functions to take the processed command line parameters and add servers to twisted.main.internet.Application. Existing twisted.tap submodules use twisted.python.usage, so the command line format is consistent between different modules.

The mktap utility gets some generic options, and then the name of the server to build. It imports a same-named twisted.tap submodule, and lets it process the rest of the options and parameters. This makes sure that the process configuring the main.Application is agnostic for where it is used. This allowed mktap to grow the --append option, which appends to an existing pickle rather then creating a new one. This option is frequently used to post-add a telnet server to an application, for net-based on the fly configuration later.

When running mktap under UNIX, it saves the user id and group id inside the tap. Then, when feeding this tap into twistd, it changes to this user/group id after binding the ports. Such a feature is necessary in any production-grade server, since ports below 1024 require root privileges to use on UNIX -- but applications should not run as root. In case changing to the specified user causes difficulty in the build environment, it is also possible to give those arguments to mktap explicitly.

from twisted.internet import tcp, stupidproxy
from twisted.python import usage

usage_message = """
usage: mktap stupid [OPTIONS]

Options are as follows:
        --port <#>, -p:         set the port number to <#>.
        --host , -h:      set the host to 
        --dest_port <#>, -d:    set the destination port to <#>
"""

class Options(usage.Options):
    optStrings = [["port", "p", 6666],
                  ["host", "h", "localhost"],
                  ["dest_port", "d", 6665]]

def getPorts(app, config):
    s = stupidproxy.makeStupidFactory(config.host, int(config.dest_port))
    return [(int(config.port), s)]
Listing 7: twisted.tap.stupid

The twisted.tap framework is one of the reasons servers can be set up with little knowledge and time. Simply running mktap with arguments can bring up a web server, a mail server or an integrated chat server -- with hardly any need for maintainance. As a working proof-on-concept, the tap2deb utility exists to wrap up tap files in Debian packages, which include scripts for running and stopping the server and interact with init(8) to make sure servers are automatically run on start-up. Such programs can also be written to interface with the Red Hat Package Manager or the FreeBSD package management systems.

% mktap --uid 33 --gid 33 web --static /var/www --port 80
% tap2deb -t web.tap -m 'Moshe Zadka '
% su
password:
# dpkg -i .build/twisted-web_1.0_all.deb
Listing 8: Bringing up a web server on a Debian system

Multi-thread Support

Sometimes, threads are unavoidable or hard to avoid. Many legacy programs which use threads want to use Twisted, and some vendor APIs have no non-blocking version -- for example, most database systems' API. Twisted can work with threads, although it supports only one thread in which the main select loop is running. It can use other threads to simulate non-blocking API over a blocking API -- it spawns a thread to call the blocking API, and when it returns, the thread calls a callback in the main thread. Threads can call callbacks in the main thread safely by adding those callbacks to a list of pending events. When the main thread is between select calls, it searches through the list of pending events, and executes them. This is used in the twisted.enterprise package to supply an event driven interfaces to databases, which uses Python's DB API.

Twisted tries to optimize for the common case -- no threads. If there is need for threads, a special call must be made to inform the twisted.python.threadable module that threads will be used. This module is implemented differently depending on whether threads will be used or not. The decision must be made before importing any modules which use threadable, and so is usually done in the main application. For example, twistd has a command line option to initialize threads.

Twisted also supplies a module which supports a threadpool, so the common task of implementing non-blocking APIs above blocking APIs will be both easy and efficient. Threads are kept in a pool, and dispatch requests are done by threads which are not working. The pool supports a maximum amount of threads, and will throw exceptions when there are more requests than allowable threads.

One of the difficulties about multi-threaded systems is using locks to avoid race conditions. Twisted uses a mechanism similar to Java's synchronized methods. A class can declare a list of methods which cannot safely be called at the same time from two different threads. A function in threadable then uses twisted.python.hook to transparently add lock/unlock around these methods. This allows Twisted classes to be written without thought about threading, except for one localized declaration which does not entail any performance penalty for the single-threaded case.

Twisted Mail Server

Mail servers have a history of security flaws. Sendmail is by now the poster boy of security holes, but no mail servers, bar maybe qmail, are free of them. Like Dan Bernstein of qmail fame said, mail cannot be simply turned off -- even the simplest organization needs a mail server. Since Twisted is written in a high-level language, many problems which plague other mail servers, notably buffer overflows, simply do not exist. Other holes are avoidable with correct design. Twisted Mail is a project trying to see if it is possible to write a high quality high performance mail server entirely in Python.

Twisted Mail is built on the SMTP server and client protocol classes. While these present a level of abstraction from the specific SMTP line semantics, they do not contain any message storage code. The SMTP server class does know how to divide responsibility between domains. When a message arrives, it analyzes the recipient's address, tries matching it with one of the registered domain, and then passes validation of the address and saving the message to the correct domain, or refuses to handle the message if it cannot handle the domain. It is possible to specify a catch-all domain, which will usually be responsible for relaying mails outwards.

While correct relaying is planned for the future, at the moment we have only so-called "smarthost" relaying. All e-mail not recognized by a local domain is relayed to a single outside upstream server, which is supposed to relay the mail further. This is the configuration for most home machines, which are Twisted Mail's current target audience.

Since the people involved in Twisted's development were reluctant to run code that runs as a super user, or with any special privileges, it had to be considered how delivery of mail to users is possible. The solution decided upon was to have Twisted deliver to its own directory, which should have very strict permissions, and have users pull the mail using some remote mail access protocol like POP3. This means only a user would write to his own mail box, so no security holes in Twisted would be able to adversely affect a user.

Future plans are to use a Perspective Broker-based service to hand mail to users to a personal server using a UNIX domain socket, as well as to add some more conventional delivery methods, as scary as they may be.

Because the default configuration of Twisted Mail is to be an integrated POP3/SMTP servers, it is ideally suited for the so-called POP toaster configuration, where there are a multitude of virtual users and domains, all using the same IP address and computer to send and receive mails. It is fairly easy to configure Twisted as a POP toaster. There are a number of deployment choices: one can append a telnet server to the tap for remote configuration, or simple scripts can add and remove users from the user database. The user database is saved as a directory, where file names are keys and file contents are values, so concurrency is not usually a problem.

% mktap mail -d foobar.com=$HOME/Maildir/ -u postmaster=secret -b \
             -p 110 -s 25
% twistd -f mail.tap

Bringing up a simple mail-server

Twisted's native mail storage format is Maildir, a format that requires no locking and is safe and atomic. Twisted supports a number of standardized extensions to Maildir, commonly known as Maildir++. Most importantly, it supports deletion as simply moving to a subfolder named Trash, so mail is recoverable if accessed through a protocol which allows multiple folders, like IMAP. However, Twisted itself currently does not support any such protocol yet.

Introducing Perspective Broker

All the World's a Game

Twisted was originally designed to support multi-player games; a simulated "real world" environment. Experience with game systems of that type is enlightening as to the nature of computing on the whole. Almost all services on a computer are modeled after some simulated real-world activity. For example, e-"mail", or "document publishing" on the web. Even "object-oriented" programming is based around the notion that data structures in a computer simulate some analogous real-world objects.

All such networked simulations have a few things in common. They each represent a service provided by software, and there is usually some object where "global" state is kept. Such a service must provide an authentication mechanism. Often, there is a representation of the authenticated user within the context of the simulation, and there are also objects aside from the user and the simulation itself that can be accessed.

For most existing protocols, Twisted provides these abstractions through twisted.internet.passport. This is so named because the most important common functionality it provides is authentication. A simulation "world" as described above -- such as an e-mail system, document publishing archive, or online video game -- is represented by subclass of Service, the authentication mechanism by an Authorizer (which is a set of Identities), and the user of the simulation by a Perspective. Other objects in the simulation may be represented by arbitrary python objects, depending upon the implementation of the given protocol.

New problem domains, however, often require new protocols, and re-implementing these abstractions each time can be tedious, especially when it's not necessary. Many efforts have been made in recent years to create generic "remote object" or "remote procedure call" protocols, but in developing Twisted, these protocols were found to require too much overhead in development, be too inefficient at runtime, or both.

Perspective Broker is a new remote-object protocol designed to be lightweight and impose minimal constraints upon the development process and use Python's dynamic nature to good effect, but still relatively efficient in terms of bandwidth and CPU utilization. twisted.spread.pb serves as a reference implementation of the protocol, but implementation of Perspective Broker in other languages is already underway. spread is the twisted subpackage dealing with remote calls and objects, and has nothing to do with the spread toolkit.

Perspective Broker extends twisted.internet.passport's abstractions to be concrete objects rather than design patterns. Rather than having a Protocol implementation translate between sequences of bytes and specifically named methods (as in the other Twisted Protocols), Perspective Broker defines a direct mapping between network messages and quasi-arbitrary method calls.

Translucent, not Transparent

In a server application where a large number of clients may be interacting at once, it is not feasible to have an arbitrarily large number of OS threads blocking and waiting for remote method calls to return. Additionally, the ability for any client to call any method of an object would present a significant security risk. Therefore, rather than attempting to provide a transparent interface to remote objects, twisted.spread.pb is "translucent", meaning that while remote method calls have different semantics than local ones, the similarities in semantics are mirrored by similarities in the syntax. Remote method calls impose as little overhead as possible in terms of volume of code, but "as little as possible" is unfortunately not "nothing".

twisted.spread.pb defines a method naming standard for each type of remotely accessible object. For example, if a client requests a method call with an expression such as myPerspective.doThisAction(), the remote version of myPerspective would be sent the message perspective_doThisAction. Depending on the manner in which an object is accessed, other method prefixes may be observe_, view_, or remote_. Any method present on a remotely accessible object, and named appropriately, is considered to be published -- since this is accomplished with getattr, the definition of "present" is not just limited to methods defined on the class, but instances may have arbitrary callable objects associated with them as long as the name is correct -- similarly to normal python objects.

Remote method calls are made on remote reference objects (instances of pb.RemoteReference) by calling a method with an appropriate name. However, that call will not block -- if you need the result from a remote method call, you pass in one of the two special keyword arguments to that method -- pbcallback or pberrback. pbcallback is a callable object which will be called when the result is available, and pberrback is a callable object which will be called if there was an exception thrown either in transmission of the call or on the remote side.

In the case that neither pberrback or pbcallback is provided, twisted.spread.pb will optimize network usage by not sending confirmations of messages.

# Server Side
class MyObject(pb.Referenceable):
    def remote_doIt(self):
        return "did it"

# Client Side
    ...
    def myCallback(result):
        print result # result will be 'did it'
    def myErrback(stacktrace):
        print 'oh no, mr. bill!'
        print stacktrace
    myRemoteReference.doIt(pbcallback=myCallback,
                           pberrback=myErrback)
Listing 9: A remotely accessible object and accompanying call

Different Behavior for Different Perspectives

Considering the problem of remote object access in terms of a simulation demonstrates a requirement for the knowledge of an actor with certain actions or requests. Often, when processing message, it is useful to know who sent it, since different results may be required depending on the permissions or state of the caller.

A simple example is a game where certain an object is invisible, but players with the "Heightened Perception" enchantment can see it. When answering the question "What objects are here?" it is important for the room to know who is asking, to determine which objects they can see. Parallels to the differences between "administrators" and "users" on an average multi-user system are obvious.

Perspective Broker is named for the fact that it does not broker only objects, but views of objects. As a user of the twisted.spread.pb module, it is quite easy to determine the caller of a method. All you have to do is subclass Viewable.

# Server Side
class Greeter(pb.Viewable):
    def view_greet(self, actor):
        return "Hello %s!\n" % actor.perspectiveName

# Client Side
    ...
    remoteGreeter.greet(pbcallback=sys.stdout.write)
    ...
Listing 10: An object responding to its calling perspective
Before any arguments sent by the client, the actor (specifically, the Perspective instance through which this object was retrieved) will be passed as the first argument to any view_xxx methods.

Mechanisms for Sharing State

In a simulation of any decent complexity, client and server will wish to share structured data. Perspective Broker provides a mechanism for both transferring (copying) and sharing (caching) that state.

Whenever an object is passed as an argument to or returned from a remote method call, that object is serialized using twisted.spread.jelly; a serializer similar in some ways to Python's native pickle. Originally, pickle itself was going to be used, but there were several security issues with the pickle code as it stands. It is on these issues of security that pickle and twisted.spread.jelly part ways.

While twisted.spread.jelly handles a few basic types such as strings, lists, dictionaries and numbers automatically, all user-defined types must be registered both for serialization and unserialization. This registration process is necessary on the sending side in order to determine if a particular object is shared, and whether it is shared as state or behavior. On the receiving end, it's necessary to prevent arbitrary code from being run when an object is unserialized -- a significant security hole in pickle for networked applications.

On the sending side, the registration is accomplished by making the object you want to serialize a subclass of one of the "flavors" of object that are handled by Perspective Broker. A class may be Referenceable, Viewable, Copyable or Cacheable. These four classes correspond to different ways that the object will be seen remotely. Serialization flavors are mutually exclusive -- these 4 classes may not be mixed in with each other.

Publishing Objects with PB

The previous samples of code have shown how an individual object will interact over a previously-established PB connection. In order to get to that connection, you need to do some set-up work on both the client and server side; PB attempts to minimize this effort.

There are two different approaches for setting up a PB server, depending on your application's needs. In the simplest case, where your application does not deal with the abstractions above -- services, identities, and perspectives -- you can simply publish an object on a particular port.

from twisted.spread import pb
from twisted.internet import main
class Echoer(pb.Root):
    def remote_echo(self, st):
        print 'echoing:', st
        return st
if __name__ == '__main__':
    app = main.Application("pbsimple")
    app.listenOn(8789, pb.BrokerFactory(Echoer()))
    app.run()
Listing 11: Creating a simple PB server

Listing 11 shows how to publish a simple object which responds to a single message, "echo", and returns whatever argument is sent to it. There is very little to explain: the "Echoer" class is a pb.Root, which is a small subclass of Referenceable designed to be used for objects published by a BrokerFactory, so Echoer follows the same rule for remote access that Referenceable does. Connecting to this service is almost equally simple.

from twisted.spread import pb
from twisted.internet import main
def gotObject(object):
    print "got object:",object
    object.echo("hello network", pbcallback=gotEcho)
def gotEcho(echo):
    print 'server echoed:',echo
    main.shutDown()
def gotNoObject(reason):
    print "no object:",reason
    main.shutDown()
pb.getObjectAt("localhost", 8789, gotObject, gotNoObject, 30)
main.run()
Listing 12: A client for Echoer objects.

The utility function pb.getObjectAt retrieves the root object from a hostname/port-number pair and makes a callback (in this case, gotObject) if it can connect and retrieve the object reference successfully, and an error callback (gotNoObject) if it cannot connect or the connection times out.

gotObject receives the remote reference, and sends the echo message to it. This call is visually noticeable as a remote method invocation by the distinctive pbcallback keyword argument. When the result from that call is received, gotEcho will be called, notifying us that in fact, the server echoed our input ("hello network").

While this setup might be useful for certain simple types of applications where there is no notion of a "user", the additional complexity necessary for authentication and service segregation is worth it. In particular, re-use of server code for things like chat (twisted.words) is a lot easier with a unified notion of users and authentication.

from twisted.spread import pb
from twisted.internet import main
class SimplePerspective(pb.Perspective):
    def perspective_echo(self, text):
        print 'echoing',text
        return text
class SimpleService(pb.Service):
    def getPerspectiveNamed(self, name):
        return SimplePerspective(name, self)
if __name__ == '__main__':
    import pbecho
    app = main.Application("pbecho")
    pbecho.SimpleService("pbecho",app).getPerspectiveNamed("guest").makeIdentity("guest")
    app.listenOn(pb.portno, pb.BrokerFactory(pb.AuthRoot(app)))
    app.save("start")
Listing 13: A PB server using twisted's "passport" authentication.

In terms of the "functionality" it offers, this server is identical. It provides a method which will echo some simple object sent to it. However, this server provides it in a manner which will allow it to cooperate with multiple other authenticated services running on the same connection, because it uses the central Authorizer for the application.

On the line that creates the SimpleService, several things happen.

  1. A SimpleService is created and persistently added to the Application instance.
  2. A SimplePerspective is created, via the overridden getPerspectiveNamed method.
  3. That SimplePerspective has an Identity generated for it, and persistently added to the Application's Authorizer. The created identity will have the same name as the perspective ("guest"), and the password supplied (also, "guest"). It will also have a reference to the service "pbecho" and a perspective named "guest", by name. The Perspective.makeIdentity utility method prevents having to deal with the intricacies of the passport Authorizer system when one doesn't require strongly separate Identitys and Perspectives.

Also, this server does not run itself, but instead persists to a file which can be run with twistd, offering all the usual amenities of daemonization, logging, etc. Once the server is run, connecting to it is similar to the previous example.

from twisted.spread import pb
from twisted.internet import main
def success(message):
    print "Message received:",message
    main.shutDown()
def failure(error):
    print "Failure...",error
    main.shutDown()
def connected(perspective):
    perspective.echo("hello world",
                     pbcallback=success,
                     pberrback=failure)
    print "connected."
pb.connect(connected, failure,   "localhost", pb.portno,
           "guest", "guest",     "pbecho", "guest", 30)
main.run()
Listing 14: Connecting to an Authorized Service

This introduces a new utility -- pb.connect. This function takes a long list of arguments and manages the handshaking and challenge/response aspects of connecting to a PB service perspective, eventually calling back to indicate either success or failure. In this particular example, we are connecting to localhost on the default PB port (8787), authenticating to the identity "guest" with the password "guest", requesting the perspective "guest" from the service "pbecho". If this can't be done within 30 seconds, the connection will abort.

In these examples, I've attempted to show how Twisted makes event-based scripting easier; this facilitates the ability to run short scripts as part of a long-running process. However, event-based programming is not natural to procedural scripts; it is more generally accepted that GUI programs will be event-driven whereas scripts will be blocking. An alternative client to our SimpleService using GTK illustrates the seamless meshing of Twisted and GTK.

from twisted.internet import main, ingtkernet
from twisted.spread.ui import gtkutil
import gtk
ingtkernet.install()
class EchoClient:
    def __init__(self, echoer):
        l.hide()
        self.echoer = echoer
        w = gtk.GtkWindow(gtk.WINDOW_TOPLEVEL)
        vb = gtk.GtkVBox(); b = gtk.GtkButton("Echo:")
        self.entry = gtk.GtkEntry(); self.outry = gtk.GtkEntry()
        w.add(vb)
        map(vb.add, [b, self.entry, self.outry])
        b.connect('clicked', self.clicked)
        w.connect('destroy', gtk.mainquit)
        w.show_all()
    def clicked(self, b):
        txt = self.entry.get_text()
        self.entry.set_text("")
        self.echoer.echo(txt, pbcallback=self.outry.set_text)
l = gtkutil.Login(EchoClient, None, initialService="pbecho")
l.show_all()
gtk.mainloop()
Listing 15: A Twisted GUI application

Event-Driven Web Object Publishing with Web.Widgets

Although PB will be interesting to those people who wish to write custom clients for their networked applications, many prefer or require a web-based front end. Twisted's built-in web server has been designed to accommodate this desire, and the presentation framework that one would use to write such an application is twisted.web.widgets. Web.Widgets has been designed to work in an event-based manner, without adding overhead to the designer or the developer's work-flow.

Surprisingly, asynchronous web interfaces fit very well into the normal uses of purpose-built web toolkits such as PHP. Any experienced PHP, Zope, or WebWare developer will tell you that separation of presentation, content, and logic is very important. In practice, this results in a "header" block of code which sets up various functions which are called throughout the page, some of which load blocks of content to display. While PHP does not enforce this, it is certainly idiomatic. Zope enforces it to a limited degree, although it still allows control structures and other programmatic elements in the body of the content.

In Web.Widgets, strict enforcement of this principle coincides very neatly with a "hands-free" event-based integration, where much of the work of declaring callbacks is implicit. A "Presentation" has a very simple structure for evaluating Python expressions and giving them a context to operate in. The "header" block which is common to many templating systems becomes a class, which represents an enumeration of events that the template may generate, each of which may be responded to either immediately or latently.

For the sake of simplicity, as well as maintaining compatibility for potential document formats other than HTML, Presentation widgets do not attempt to parse their template as HTML tags. The structure of the template is "HTML Text %%%%python_expression()%%%% more HTML Text". Every set of 4 percent signs (%%%%) switches back and forth between evaluation and printing.

No control structures are allowed in the template. This was originally thought to be a potentially major inconvenience, but with use of the Web.Widgets code to develop a few small sites, it has seemed trivial to encapsulate any table-formatting code within a method; especially since those methods can take string arguments if there's a need to customize the table's appearance.

The namespace for evaluating the template expressions is obtained by scanning the class hierarchy for attributes, and getting each of those attributes from the current instance. This means that all methods will be bound methods, so indicating "self" explicitly is not required. While it is possible to override the method for creating namespaces, using this default has the effect of associating all presentation code for a particular widget in one class, along with its template. If one is working with a non-programmer designer, and the template is in an external file, it is always very clear to the designer what functionality is available to them in any given scope, because there is a list of available methods for any given class.

A convenient event to register for would be a response from the PB service that we just implemented. We can use the Deferred class in order to indicate to the widgets framework that certain work has to be done later. This is a Twisted convention which one can currently use in PB as well as webwidgets; any framework which needs the ability to defer a return value until later should use this facility. Elements of the page will be rendered from top to bottom as data becomes available, so the page will not be blocked on rendering until all deferred elements have been completed.

from twisted.spread import pb
from twisted.python import defer
from twisted.web import widgets
class EchoDisplay(widgets.Presentation):
    template = """<H1>Welcome to my widget, displaying %%%%echotext%%%%.</h1>
    <p>Here it is: %%%%getEchoPerspective()%%%%</p>"""
    echotext = 'hello web!'
    def getEchoPerspective(self):
        d = defer.Deferred()
        pb.connect(d.callback, d.errback, "localhost", pb.portno,
                   "guest", "guest",      "pbecho", "guest", 1)
        d.addCallbacks(self.makeListOf, self.formatTraceback)
        return ['<b>',d,'</b>']
    def makeListOf(self, echoer):
        d = defer.Deferred()
        echoer.echo(self.echotext, pbcallback=d.callback, pberrback=d.errback)
        d.addCallbacks(widgets.listify, self.formatTraceback)
        return [d]
if __name__ == "__main__":
    from twisted.web import server
    from twisted.internet import main
    a = main.Application("pbweb")
    gdgt = widgets.Gadget()
    gdgt.widgets['index'] = EchoDisplay()
    a.listenOn(8080, server.Site(gdgt))
    a.run()
Listing 16: an event-based web widget.

Each time a Deferred is returned as part of the page, the page will pause rendering until the deferred's callback method is invoked. When that callback is made, it is inserted at the point in the page where rendering left off.

If necessary, there are options within web.widgets to allow a widget to postpone or cease rendering of the entire page -- for example, it is possible to write a FileDownload widget, which will override the rendering of the entire page and replace it with a file download.

The final goal of web.widgets is to provide a framework which encourages the development of usable library code. Too much web-based code is thrown away due to its particular environment requirements or stylistic preconceptions it carries with it. The goal is to combine the fast-and-loose iterative development cycle of PHP with the ease of installation and use of Zope's "Product" plugins.

Things That Twisted Does Not Do

It is unfortunately well beyond the scope of this paper to cover all the functionality that Twisted provides, but it serves as a good overview. It may seem as though twisted does anything and everything, but there are certain features we never plan to implement because they are simply outside the scope of the project.

Despite the multiple ways to publish and access objects, Twisted does not have or support an interface definition language. Some developers on the Twisted project have experience with remote object interfaces that require explicit specification of all datatypes during the design of an object's interface. We feel that such interfaces are in the spirit of statically-typed languages, and are therefore suited to the domain of problems where statically-typed languages excel. Twisted has no plans to implement a protocol schema or static type-checking mechanism, as the efficiency gained by such an approach would be quickly lost again by requiring the type conversion between Python's dynamic types and the protocol's static ones. Since one of the key advantages of Python is its extremely flexible dynamic type system, we felt that a dynamically typed approach to protocol design would share some of those advantages.

Twisted does not assume that all data is stored in a relational database, or even an efficient object database. Currently, Twisted's configuration state is all stored in memory at run-time, and the persistent parts of it are pickled at one go. There are no plans to move the configuration objects into a "real" database, as we feel it is easier to keep a naive form of persistence for the default case and let application-specific persistence mechanisms handle persistence. Consequently, there is no object-relational mapping in Twisted; twisted.enterprise is an interface to the relational paradigm, not an object-oriented layer over it.

There are other things that Twisted will not do as well, but these have been frequently discussed as possibilities for it. The general rule of thumb is that if something will increase the required installation overhead, then Twisted will probably not do it. Optional additions that enhance integration with external systems are always welcome: for example, database drivers for Twisted or a CORBA IDL for PB objects.

Future Directions

Twisted is still a work in progress. The number of protocols in the world is infinite for all practical purposes, and it would be nice to have a central repository of event-based protocol implementations. Better integration with frameworks and operating systems is also a goal. Examples for integration opportunities are automatic creation of installer for "tap" files (for Red Hat Packager-based distributions, FreeBSD's package management system or Microsoft Windows(tm) installers), and integration with other event-dispatch mechanisms, such as win32's native message dispatch.

A still-nascent feature of Twisted, which this paper only touches briefly upon, is twisted.enterprise: it is planned that Twisted will have first-class database support some time in the near future. In particular, integration between twisted.web and twisted.enterprise to allow developers to have SQL conveniences that they are used to from other frameworks.

Another direction that we hope Twisted will progress in is standardization and porting of PB as a messaging protocol. Some progress has already been made in that direction, with XEmacs integration nearly ready for release as of this writing.

Tighter integration of protocols is also a future goal, such an FTP server that can serve the same resources as a web server, or a web server that allows users to change their POP3 password. While Twisted is already a very tightly integrated framework, there is always room for more integration. Of course, all this should be done in a flexible way, so the end-user will choose which components to use -- and have those components work well together.

Conclusions

As shown, Twisted provides a lot of functionality to the Python network programmer, while trying to be in his way as little as possible. Twisted gives good tools for both someone trying to implement a new protocol, or someone trying to use an existing protocol. Twisted allows developers to prototype and develop object communication models with PB, without designing a byte-level protocol. Twisted tries to have an easy way to record useful deployment options, via the twisted.tap and plugin mechanisms, while making it easy to generate new forms of deployment. And last but not least, even Twisted is written in a high-level language and uses its dynamic facilities to give an easy API, it has performance which is good enough for most situations -- for example, the web server can easily saturate a T1 line serving dynamic requests on low-end machines.

While still an active project, Twisted can already used for production programs. Twisted can be downloaded from the main Twisted site (http://www.twistedmatrix.com) where there is also documentation for using and programming Twisted.

Acknowledgements

We wish to thank Sean Riley, Allen Short, Chris Armstrong, Paul Swartz, Jürgen Hermann, Benjamin Bruheim, Travis B. Hartwell, and Itamar Shtull-Trauring for being a part of the Twisted development team with us.

Thanks also to Jason Asbahr, Tommi Virtanen, Gavin Cooper, Erno Kuusela, Nick Moffit, Jeremy Fincher, Jerry Hebert, Keith Zaback, Matthew Walker, and Dan Moniz, for providing insight, commentary, bandwidth, crazy ideas, and bug-fixes (in no particular order) to the Twisted team.

References

  1. The Twisted site, http://www.twistedmatrix.com
  2. Douglas Schmidt, Michael Stal, Hans Rohnert and Frank Buschmann, Pattern-Oriented Software Architecture, Volume 2, Patterns for Concurrent and Networked Objects, John Wiley & Sons
  3. Abhishek Chandra, David Mosberger, Scalability of Linux Event-Dispatch Mechanisms, USENIX 2001, http://lass.cs.umass.edu/~abhishek/papers/usenix01/paper.ps
  4. Protocol specifications, http://www.rfc-editor.com
  5. The Twisted Philosophical FAQ, http://www.twistedmatrix.com/page.epy/twistedphil.html
  6. Twisted Advocacy, http://www.twistedmatrix.com/page.epy/whytwisted.html
  7. Medusa, http://www.nightmare.com/medusa/index.html
  8. Using Spreadable Web Servers, http://www.twistedmatrix.com/users/jh.twistd/python/moin.cgi/TwistedWeb
  9. Twisted.Spread implementations for other languages, http://www.twistedmatrix.com/users/washort/
  10. PHP: Hypertext Preprocessor, http://www.php.net/
  11. The Z Object Publishing Environment, http://www.zope.org/, http://zope.com/