Gregory Trubetskoy grisha@apache.org Jan 2003 Introduction To Mod_Python Note: In this paper "Apache" refers to the "Apache HTTP Server", as opposed to the Apache Software Foundation, which is not a piece of software, but the non-profit corporation that owns, produces and maintains a multitude of open source software. 1. Introduction What is mod_python? It is difficult to describe it in a single phrase, so here is a list of things that mod_python can be described as: 1. A Python interpreter embedded within an Apache loadable module. 2. A handler of Apache request phases, providing the ability to implement phase processing in Python. 3. An interface to a subset of the Apache API. 4. A complete (and evolving) framework for developing web applications. What is now known as mod_python actually originated as a plug-in for the Netscape web server called NSAPY. The idea and original code was taken from the Internet Programming With Python book. In 1998 the code was ported to Apache and the name was changed to HTTPDAPY (by replacing NS in NSAPY with HTTPD). In 2000, after significant code and interface changes, a first version of mod_python was released. In September of 2002, after a long thread of discussions on the mod_python list regarding pupolarity of Python as a web development language, mod_python was donated to the Apache Software Foundation (ASF). The ASF donation has been a success, the project is well known within the ASF community and many ASF contributors make regular contributions to the project, while the user base continues to grow slowly but steadily. 2. Status The latest version of mod_python is 3.1. The 3.x versions require Apache 2.0 and will not work with Apache 1.3. Mod_python 3.1 also requires Python 2.3. For Apache 1.3 there is still available for download version 2.7 of mod_python, but it has not had any new features added to it for at least two years now and probably never will. 3. Getting and Installing Mod_python is available from www.apache.org or www.modpython.org. Installing it is very simple and amounts to the usual "./configure; make; make install" dance on UNIX, while for Windows there exists a binary installer. 4. Request Phases At the core of mod_python is the ability to write Apache request phase handling in Python. When idle, Apache waits for incoming requests. An in-coming request starts a request phase cycle where Apache steps through each phase calling functions registered to process it. The modular design of Apache allows for any Apache module to register a function to process any phase, exclusively, or in conjunction with other modules. There are approximately a dozen request phases ("approximately" because Apache is flexible enough to allow modules to introduce additional phases, though this feature is not supported by mod_python). The request cycle begins with the PostReadRequest phase which provides an opportunity to do something immediately after a request has been read and ends with the Log phase which provides the opportunity to log the request. In general using mod_python to write request phases is very powerful, but because it approaches everything from the standpoint of request processing is isnŐt well suited towards application development. Thankfully, mod_python includes the necessary layer of abstraction to provide a higher-level framework geared for web based applications. This presentation which will focus on the web application development with mod_python rather than request phase implementation. To start writing web applications, it is best to use phase handlers that are included with the mod_python distribution: Publisher and PSP. 5. "Hello world" with Publisher Let's look at a simple "Hello World" application using the Publisher handler. Following is the necessary Apache configuration snippet: LoadModule python_module modules/mod_python.so <Directory /www/htdocs/mp> AddHandler mod_python .py PythonHandler mod_python.publisher PythonDebug On </Directory> The LoadModule directive instructs Apache to load the mod_python module on startup. The examples here assume that the document root is /www/htdocs, and the example scripts will be located in /www/htdocs/mp. The AddHandler directive instructs Apache to delegate processing of all requests for files ending in .py to mod_python. The next directive, PythonHandler, tells mod_python that the generic (content-producing) phase of the requests should be processed by the mod_python.publisher module, which is a mod_python handler included with mod_python. The PythonDebug directive instructs mod_python to display tracebacks from Python errors in the browser, as well as the server log; without it, errors generate tracebacks in the logs only, while the browser gets the "Internal Server Error" message generated by Apache. Note that the above configuration makes no mentions of CGI. Many new users start out with the assumption that mod_python requires CGI to be enabled and often try to add ScriptAlias or ExecCGI to their configuration. This is completely unnecessary, and, in fact will produce results that are undefined. Remember: mod_python and CGI do not mix. Here is the Python code, /www/htdocs/mp/hello.py: def hello(name=None): if name: return 'Hello, %s!' % name.capitalize() else: return 'Hello there!' Given the above setup, a request to this url: http://localhost/mp/hello.py/hello will result in: Hello there! And a request to this url: http://localhost/mp/hello.py/hello?name=john will produce: Hello, John! The important thing to note in this example is that the URL contains the Python module name, followed by a function within the module, and the query argument mapping to the keyword argument of the function. In essence this is what the Publisher handler is all about. Many developers prefer not to have implementation specific information as part of the URL. In the example above the ".py" extension is necessary because the AddHandler configuration directive relies on it. The way to avoid this is to use the SetHandler directive in place of AdHandler: <Directory /www/htdocs/mp> SetHandler mod_python ... The SetHandler instructs Apache to outsource all requests in this directory to mod_python, regardless of file extension. With SetHandler, the ".py" in the URL is no longer necessary, resulting in: http://localhost/mp/hello/hello It is also possible to omit the name of the module in the URL. To do this we can rename the hello.py file to index.py. The publisher handler automatically looks for an index.py module when no module is specified, resulting in an even simpler URL: http://localhost/mp/hello It is also possible to skip the function name, be renaming the function to "index", resulting in: http://localhost/mp 6. "Hello World" in PSP Now lets look at PSP. PSP most likely stands for "Python Server Pages", though the acronym was probably meant to resemble other similar technologies for including Python in HTML, such as ASP or JSP. Mod_python PSP is implemented in C, using a flex-generated scanner, which makes it very fast. In addition to being very efficient at parsing input files, PSP also caches compiled pages in its code cache therefore minimizing the amount of parsing and compilation. Mod_python PSP came to existence as a project originally created by Sterling Hughes, one of PHP core developers who is also a Python fan. He created a proof-of-concept PSP implementation as an Apache module called mod_psp. Mod_psp was later improved and integrated into mod_python. PSP is a new feature of mod_python (as of 3.1), and will probably see a lot of improvement over time. In its current form it appears to be stable and flexible enough for production use. 7. PSP Goals The design of PSP aims to satisfy a few important goals, mainly to remain in the spirit of simplicity and clarity that Python has a reputation for: 1. No new language to learn. Many other frameworks, mainly as way to overcome the white space conflict between Python and HTML have introduced alternative syntax, often complex enough that it appeared as a separate language somewhat resembling Python. Mod_Python PSP aims to avoid this. 2. Being as fast as possible. A key quality of mod_python is that it solves many complex and low-level operating system and networking issues for the sake of performance, something that most Python developers do not have the skill or time to deal with. A PSP implementation would have to be on par with this notion - there is no value in hacking together yet another slow parser implemented in Python. The fact that PSP is a flex-generated C scanner adds real value in that it is not something that most people "can try at home". 3. No Python semantics in HTML. Python uses indentation to denote blocks of code. Indentation has no significance in HTML. PSP does not require HTML to be indented. 4. No HTML semantics in Python. Since HTML pays no attention to indentation, it seems there should be another way to denote blocks within PSP. The first temptation is to introduce a code block delimiter character into Python. PSP avoids this. 8. PSP Syntax PSP uses a greater-percent ('<%') syntax similar to JSP to denote PSP code. There are four types of inlined code: 1. Code. Delimited by <% and %>. This is the Python code which controls the logic of the page and produces no output. 2. Expression. Delimited by <%= and %>. This is Python code resulting in a value which is included with the output of the page. 3. Directive. Delimited by <%@ and %>. This is a special instruction to PSP. (E.g. <%@ include file="header.psp" %> is a directive to include another file). 4. Comment. Delimited by <%-- and --%>. Unlike HTML comments, these never make it past the PSP parser. The main difficulty with using Python together with HTML is the issue of whitespace, which has special meaning in Python, but is ignored in HTML. PSP uses the rule that the block denoted by the last indent persists throughout the HTML that follows the code. (The last indent "sticks"). A very subtle difference with Python is that the last line of code that sets the indent can be a comment. (Normally, Python does not pay attention to whitespace before comments). <% if a == 1: # begin %> ... some html ... <% # end %> In this example, the block of code is started by the comment "# begin" and ended by "# end". The words "begin" and "end" have no significance, and can be anything, though using meaningful words such as "begin" and "end" is probably a good idea for code clarity. The above example could also be written like so: <% if a == 1: %> ... some html ... <% %> This demonstrates another subtle difference introduced by the PSP parser. The parser will insert an indent (a tab) when the code ends with a ":" followed by a newline. This feature exists as a convenience for sloppy coders, and should probably be avoided because such code is confusing to read. 9. PSP "Hello world" The following Apache config tells Apache to handle all files ending in .psp with the mod_python.psp handler: <Directory /www/htdocs/mp> AddHandler mod_python .psp PythonHandler mod_python.psp PythonDebug On </Directory> Here is the PSP code: <html> <% if form.has_key('name'): greet = 'Hello, %s!' % form['name'].capitalize() else: greet = 'Hello there!' # end %> <h1><%= greet %></h1> </html> While the above example should be understandable without any additional explaining, one details that needs special emphasis is the "# end" line. Without it, all the HTML below would be part of the "else" block above it. Forgetting to terminate blocks is a frequent source of programmer errors in PSP. The PSP parser will take the above code and convert it to Python that would look approximately like this: req.write("""<html> """) if form.has_key('name'): greet = 'Hello, %s!' % form['name'].capitalize() else: greet = 'Hello there!' # end req.write("""</h1>"""); req.write(str(greet)); req.write("""</h1> </html>""") This Python code will be compiled and the resulting code object stored in the PSP memory cache. 10. Debugging PSP Errors in PSP can be tricky to debug. The PSP parser which converts PSP to Python code never produces errors. So syntactical errors in PSP will result in bad Python code, which will trigger compilation errors produced by the Python interpreter. The PSP parser tries to make it's best effor to produce Python code in which line numbers correspond with the original PSP code (this is why it uses a semi-colon in the line before last in the above example). This doesn't always work, for example use of the "include" PSP directive throughs all the line numbers off. One tool to aid in debugging is a special feature that allows you to peek at the intermediate Python code produced from your PSP page. If PythonDebug is On, you can append an underscore to the URL to see the side-be-side display of the original PSP and resulting Python along with line numbers. Don't forget that in order for this feature to work, the extension with the underscore has to be handled by mod_python, therefore your AddHandler should look like this: AddHandler .psp .psp_ Furuture versions of PSP will possibly introduce PSP parser errors and other tools to aid in the debugging process. 11. Combining Publisher and PSP This is an example of how to use PSP as a templating system from within the Publisher handler. While at first this may seem like complicating things, this approach actually provides a clean separation of the application logic and presentation. The Apache config: <Directory /www/htdocs/mp> SetHandler mod_python PythonHandler mod_python.publisher PythonDebug On </Directory> The code for the template (/www/htdocs/mp/hello.tmpl): <html> <h1><%= greet %></h2> </html> The code to be called from the publisher (hello2.py): from mod_python import psp def hello(req, name=''): s = 'Hello, there!' if name: s = 'Hello, %s!' % name.capitalize() tmpl = psp.PSP(req, filename='hello.tmpl') tmpl.run(vars = { 'greet': s }) return In this example we create an instance of the psp.PSP class, providing it with the filename of the template. We then call the resulting object's run() method, passing it a dictionary of arguments (in our case a single variable named 'greet' with a value of s). In the first request, the instantiation of the PSP class will compile the template, subsequent requests will use the cached code object. The call to run() is what produces and sends to the client the resulting HTML. 11. Advantages of Publisher and PSP The advantage of using the Publisher as an interface to the client and the container of all application logic, while using PSP as a templating mechanism is the clean sepraration of presentation from logic. The PSP templates can be placed in their own directory, editable by web designers that can alter the appearance of the application without having to know a whole lot about mod_python. 12. Other Web Development tools In addition to the nice handlers, mod_python also includes a couple of tools that simplify common web developemnt tasks. The mod_python.Cookie provides support for cookie handling. This class directly accesses the request information using the mod_python API and is therefore faster under mod_python than the Standard Library cookie module. It is also semantically adopted to mod_python - the Standard Library Cookie module was designed with CGI in mind and is a bit ackward to use under mod_python. The mod_python.Cookie also provides out-of-the-box support for cryptographic cookie signing using HMAC (SignedCookie) and data marshalling (MarshalCookie). Unlike the Standard Library Cookie module, marshalling is implemented in a secure way because it requires the cookie to always be signed. The mod_python.Session provides support for server-side session support. Server-side sessions have always been a very difficult problem for developers to overcome due to Apache's multi-process architecture. Implementing a session requires inter-process communication and locking, which is not an easy thing to implement, especially in a cross-platform fashion. Mod_python.Session uses some of the Apache API's to provide efficient cross-platform locking and has a choice of using a DBM or memory for storing the session data. 13. Conclusion Starting with version 3.1, mod_python is no longer an esoteric tool made for experienced programmers seeking top performance and scalability for dynamic content sites. It now includes all the necessary components suitable for simple web development, which would hopefully result in a more mainstream adoption. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 80 End: