Gregory Trubetskoy
grisha@apache.org

Jan 2003


                Introduction To Mod_Python



Note: In this paper "Apache" refers to the "Apache HTTP Server", as opposed to
the Apache Software Foundation, which is not a piece of software, but the
non-profit corporation that owns, produces and maintains a multitude of open
source software.

1. Introduction

What is mod_python? It is difficult to describe it in a single phrase, so here
is a list of things that mod_python can be described as:

   1. A Python interpreter embedded within an Apache loadable module.

   2. A handler of Apache request phases, providing the ability to implement
   phase processing in Python.

   3. An interface to a subset of the Apache API.

   4. A complete (and evolving) framework for developing web applications.

What is now known as mod_python actually originated as a plug-in for the
Netscape web server called NSAPY. The idea and original code was taken from the
Internet Programming With Python book. In 1998 the code was ported to Apache and
the name was changed to HTTPDAPY (by replacing NS in NSAPY with HTTPD). In 2000,
after significant code and interface changes, a first version of mod_python was
released. In September of 2002, after a long thread of discussions on the
mod_python list regarding pupolarity of Python as a web development language,
mod_python was donated to the Apache Software Foundation (ASF). The ASF donation
has been a success, the project is well known within the ASF community and many
ASF contributors make regular contributions to the project, while the user base
continues to grow slowly but steadily.

2. Status

The latest version of mod_python is 3.1. The 3.x versions require Apache 2.0 and
will not work with Apache 1.3. Mod_python 3.1 also requires Python 2.3. For
Apache 1.3 there is still available for download version 2.7 of mod_python, but
it has not had any new features added to it for at least two years now and
probably never will.

3. Getting and Installing

Mod_python is available from www.apache.org or www.modpython.org. Installing it
is very simple and amounts to the usual "./configure; make; make install" dance
on UNIX, while for Windows there exists a binary installer.

4. Request Phases

At the core of mod_python is the ability to write Apache request phase handling
in Python. When idle, Apache waits for incoming requests. An in-coming request
starts a request phase cycle where Apache steps through each phase calling
functions registered to process it. The modular design of Apache allows for any
Apache module to register a function to process any phase, exclusively, or in
conjunction with other modules.

There are approximately a dozen request phases ("approximately" because Apache
is flexible enough to allow modules to introduce additional phases, though this
feature is not supported by mod_python). The request cycle begins with the
PostReadRequest phase which provides an opportunity to do something immediately
after a request has been read and ends with the Log phase which provides the
opportunity to log the request.

In general using mod_python to write request phases is very powerful, but
because it approaches everything from the standpoint of request processing is
isnŐt well suited towards application development. Thankfully, mod_python
includes the necessary layer of abstraction to provide a higher-level framework
geared for web based applications. This presentation which will focus on the web
application development with mod_python rather than request phase
implementation.
 
To start writing web applications, it is best to use phase handlers that are
included with the mod_python distribution: Publisher and PSP.

5. "Hello world" with Publisher

Let's look at a simple "Hello World" application using the Publisher
handler. Following is the necessary Apache configuration snippet:

  LoadModule python_module modules/mod_python.so

  <Directory /www/htdocs/mp>
	AddHandler mod_python .py
	PythonHandler mod_python.publisher
	PythonDebug On
  </Directory>

The LoadModule directive instructs Apache to load the mod_python module on
startup. The examples here assume that the document root is /www/htdocs, and the
example scripts will be located in /www/htdocs/mp. The AddHandler directive
instructs Apache to delegate processing of all requests for files ending in .py
to mod_python. The next directive, PythonHandler, tells mod_python that the
generic (content-producing) phase of the requests should be processed by the
mod_python.publisher module, which is a mod_python handler included with
mod_python. The PythonDebug directive instructs mod_python to display tracebacks
from Python errors in the browser, as well as the server log; without it, errors
generate tracebacks in the logs only, while the browser gets the "Internal
Server Error" message generated by Apache.

Note that the above configuration makes no mentions of CGI. Many new users start
out with the assumption that mod_python requires CGI to be enabled and often try
to add ScriptAlias or ExecCGI to their configuration. This is completely
unnecessary, and, in fact will produce results that are undefined. Remember:
mod_python and CGI do not mix.

Here is the Python code, /www/htdocs/mp/hello.py:

def hello(name=None):
    if name:
        return 'Hello, %s!' % name.capitalize()
    else:
        return 'Hello there!'

Given the above setup, a request to this url:

  http://localhost/mp/hello.py/hello

will result in:

  Hello there!

And a request to this url:

  http://localhost/mp/hello.py/hello?name=john

will produce:

  Hello, John!

The important thing to note in this example is that the URL contains the Python
module name, followed by a function within the module, and the query argument
mapping to the keyword argument of the function. In essence this is what the
Publisher handler is all about.

Many developers prefer not to have implementation specific information as part
of the URL. In the example above the ".py" extension is necessary because the
AddHandler configuration directive relies on it. The way to avoid this is to use
the SetHandler directive in place of AdHandler:

  <Directory /www/htdocs/mp>
	SetHandler mod_python
      ...

The SetHandler instructs Apache to outsource all requests in this directory to
mod_python, regardless of file extension. With SetHandler, the ".py" in the URL
is no longer necessary, resulting in:

http://localhost/mp/hello/hello

It is also possible to omit the name of the module in the URL. To do this we can
rename the hello.py file to index.py. The publisher handler automatically looks
for an index.py module when no module is specified, resulting in an even simpler
URL:

http://localhost/mp/hello

It is also possible to skip the function name, be renaming the function to
"index", resulting in:

http://localhost/mp

6. "Hello World" in PSP

Now lets look at PSP. 

PSP most likely stands for "Python Server Pages", though the acronym was
probably meant to resemble other similar technologies for including Python in
HTML, such as ASP or JSP.

Mod_python PSP is implemented in C, using a flex-generated scanner, which makes
it very fast. In addition to being very efficient at parsing input files, PSP
also caches compiled pages in its code cache therefore minimizing the amount of
parsing and compilation.

Mod_python PSP came to existence as a project originally created by Sterling
Hughes, one of PHP core developers who is also a Python fan. He created a
proof-of-concept PSP implementation as an Apache module called mod_psp. Mod_psp
was later improved and integrated into mod_python.

PSP is a new feature of mod_python (as of 3.1), and will probably see a lot of
improvement over time. In its current form it appears to be stable and flexible
enough for production use.

7. PSP Goals

The design of PSP aims to satisfy a few important goals, mainly to remain in the
spirit of simplicity and clarity that Python has a reputation for:

       1. No new language to learn. Many other frameworks, mainly as way to
          overcome the white space conflict between Python and HTML have
          introduced alternative syntax, often complex enough that it appeared
          as a separate language somewhat resembling Python. Mod_Python PSP aims
          to avoid this.

       2. Being as fast as possible. A key quality of mod_python is that it
          solves many complex and low-level operating system and networking
          issues for the sake of performance, something that most Python
          developers do not have the skill or time to deal with. A PSP
          implementation would have to be on par with this notion - there is no
          value in hacking together yet another slow parser implemented in
          Python. The fact that PSP is a flex-generated C scanner adds real
          value in that it is not something that most people "can try at home".

       3. No Python semantics in HTML. Python uses indentation to denote blocks
          of code. Indentation has no significance in HTML. PSP does not require
          HTML to be indented.

       4. No HTML semantics in Python. Since HTML pays no attention to
          indentation, it seems there should be another way to denote blocks
          within PSP. The first temptation is to introduce a code block
          delimiter character into Python. PSP avoids this.

8. PSP Syntax

PSP uses a greater-percent ('<%') syntax similar to JSP to denote PSP
code. There are four types of inlined code:

      1. Code. Delimited by <% and %>. This is the Python code which controls
         the logic of the page and produces no output.

      2. Expression. Delimited by <%= and %>. This is Python code resulting in a
         value which is included with the output of the page.

      3. Directive. Delimited by <%@ and %>. This is a special instruction to
         PSP. (E.g. <%@ include file="header.psp" %> is a directive to include
         another file).

      4. Comment. Delimited by <%-- and --%>. Unlike HTML comments, these never
         make it past the PSP parser.

The main difficulty with using Python together with HTML is the issue of
whitespace, which has special meaning in Python, but is ignored in HTML. PSP
uses the rule that the block denoted by the last indent persists throughout the
HTML that follows the code. (The last indent "sticks").

A very subtle difference with Python is that the last line of code that sets the
indent can be a comment. (Normally, Python does not pay attention to whitespace
before comments).

<%
if a == 1:
    # begin
%>
   ... some html ...
<%
# end
%>

In this example, the block of code is started by the comment "# begin" and ended
by "# end". The words "begin" and "end" have no significance, and can be
anything, though using meaningful words such as "begin" and "end" is probably a
good idea for code clarity.

The above example could also be written like so:

<%
if a == 1:        
%>

   ... some html ...
<%
%>

This demonstrates another subtle difference introduced by the PSP parser. The
parser will insert an indent (a tab) when the code ends with a ":" followed by a
newline. This feature exists as a convenience for sloppy coders, and should
probably be avoided because such code is confusing to read.

9. PSP "Hello world"

The following Apache config tells Apache to handle all files ending in .psp with
the mod_python.psp handler:

<Directory /www/htdocs/mp>
    AddHandler mod_python .psp
    PythonHandler mod_python.psp
    PythonDebug On
</Directory>

Here is the PSP code:

<html>
<%
if form.has_key('name'):
    greet = 'Hello, %s!' % form['name'].capitalize()
else:
    greet = 'Hello there!'
# end
%>
   <h1><%= greet %></h1>
</html>

While the above example should be understandable without any additional
explaining, one details that needs special emphasis is the "# end" line. Without
it, all the HTML below would be part of the "else" block above it. Forgetting to
terminate blocks is a frequent source of programmer errors in PSP.

The PSP parser will take the above code and convert it to Python that would look
approximately like this:

req.write("""<html>
""")
if form.has_key('name'):
    greet = 'Hello, %s!' % form['name'].capitalize()
else:
    greet = 'Hello there!'
# end
req.write("""</h1>"""); req.write(str(greet)); req.write("""</h1>
</html>""")

This Python code will be compiled and the resulting code object stored in the
PSP memory cache.


10. Debugging PSP

Errors in PSP can be tricky to debug. The PSP parser which converts PSP to
Python code never produces errors. So syntactical errors in PSP will result in
bad Python code, which will trigger compilation errors produced by the Python
interpreter. The PSP parser tries to make it's best effor to produce Python code
in which line numbers correspond with the original PSP code (this is why it uses
a semi-colon in the line before last in the above example). This doesn't always
work, for example use of the "include" PSP directive throughs all the line
numbers off.

One tool to aid in debugging is a special feature that allows you to peek at the
intermediate Python code produced from your PSP page. If PythonDebug is On, you
can append an underscore to the URL to see the side-be-side display of the
original PSP and resulting Python along with line numbers.

Don't forget that in order for this feature to work, the extension with the
underscore has to be handled by mod_python, therefore your AddHandler should
look like this:

  AddHandler .psp .psp_

Furuture versions of PSP will possibly introduce PSP parser errors and other
tools to aid in the debugging process.

11. Combining Publisher and PSP

This is an example of how to use PSP as a templating system from within the
Publisher handler. While at first this may seem like complicating things, this
approach actually provides a clean separation of the application logic and
presentation.

The Apache config:

<Directory /www/htdocs/mp>
	SetHandler mod_python
	PythonHandler mod_python.publisher
	PythonDebug On
</Directory>

The code for the template (/www/htdocs/mp/hello.tmpl):

<html>
    <h1><%= greet %></h2>
</html>

The code to be called from the publisher (hello2.py):

from mod_python import psp
 
def hello(req, name=''):
    s = 'Hello, there!'
    if name:
        s = 'Hello, %s!' % name.capitalize()
    tmpl = psp.PSP(req, filename='hello.tmpl')
    tmpl.run(vars = { 'greet': s })
    return

In this example we create an instance of the psp.PSP class, providing it with
the filename of the template. We then call the resulting object's run() method,
passing it a dictionary of arguments (in our case a single variable named
'greet' with a value of s).

In the first request, the instantiation of the PSP class will compile the
template, subsequent requests will use the cached code object. The call to run()
is what produces and sends to the client the resulting HTML.

11. Advantages of Publisher and PSP

The advantage of using the Publisher as an interface to the client and the
container of all application logic, while using PSP as a templating mechanism is
the clean sepraration of presentation from logic.

The PSP templates can be placed in their own directory, editable by web
designers that can alter the appearance of the application without having to
know a whole lot about mod_python.

12. Other Web Development tools

In addition to the nice handlers, mod_python also includes a couple of tools
that simplify common web developemnt tasks.

The mod_python.Cookie provides support for cookie handling. This class directly
accesses the request information using the mod_python API and is therefore
faster under mod_python than the Standard Library cookie module. It is also
semantically adopted to mod_python - the Standard Library Cookie module was
designed with CGI in mind and is a bit ackward to use under mod_python.

The mod_python.Cookie also provides out-of-the-box support for cryptographic
cookie signing using HMAC (SignedCookie) and data marshalling
(MarshalCookie). Unlike the Standard Library Cookie module, marshalling is
implemented in a secure way because it requires the cookie to always be signed.

The mod_python.Session provides support for server-side session
support. Server-side sessions have always been a very difficult problem for
developers to overcome due to Apache's multi-process architecture. Implementing
a session requires inter-process communication and locking, which is not an easy
thing to implement, especially in a cross-platform fashion. Mod_python.Session
uses some of the Apache API's to provide efficient cross-platform locking and
has a choice of using a DBM or memory for storing the session data.

13. Conclusion

Starting with version 3.1, mod_python is no longer an esoteric tool made for
experienced programmers seeking top performance and scalability for dynamic
content sites. It now includes all the necessary components suitable for simple
web development, which would hopefully result in a more mainstream adoption.






Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 80
End: