Application Configuration Using ZConfig

Fred L. Drake, Jr., fred@zope.com (Zope Corporation)
Chris McDonough, chrism@zope.com (Zope Corporation)

Introduction

ZConfig is a Python package supporting application configuration. The basic facilities include a simple API for the application, a configuration file format, and support for configuration schema.

ZConfig performs some of the same duties as the ConfigParser module in Python's standard library, but also provides support for hierarchical sections, data conversion, and input validation. The data conversion and input validation are based on a declarative schema language in XML. Using a schema to describe configuration helps avoid some kinds of application and configuration errors.

Comparison with ConfigParser

ConfigParser provides a simple and easy-to-use API that allows an application to handle any configuration file it can parse (essentially .INI files if we ignore some historical accidents). While this can be convenient, it makes writing the application more tedious because the application is responsible for performing every data conversion and every use of a default value each time it consults the configuration data. While this kind of work easily be pushed off to a utility, that utility needs to be written for every application. Historical circumstances make the ConfigParser API more difficult to use than it should be.

Particular points of difficulty that seem to cause problems for ConfigParser users on a recurring basis include:

The simplest use of ConfigParser (ConfigParser.read()) makes it hard to know whether or not a configuration file was actually provided.
The magical string-substitution bites users more than is acceptable, in large part because it's not entirely predictable. Not using this functionality is difficult with versions of the module prior to the release of Python 2.3.
Different applications parse .INI files differently, so the set of characters allowed in keys needs to be expanded regularly.

ZConfig addresses these points in the following ways:

When the application requests that a configuration be loaded, an exception is raised if the file does not exist.
ZConfig has more rational support for string substitution. String substitution is supported and cannot be disabled, but the algorithms used are similar to those used by the ConfigParser.SafeConfigParser class added with Python 2.3. String substitution is not recursive as it is in ConfigParser, and names defined for their value as substitution texts do not pollute the configuration namespace.
Disparities between parser implementations do not affect ZConfig since there's only one implementation.

ZConfig provides the following additional functionality:

Configuration sections can be arranged hierarhically if needed, but are not required if the application does not require them.
Configuration files are checked against application-defined schema, so typographical and structural errors in a configuration can be caught early, making it easy for an application to report the error to the user before performing any initialization with side effects.
Resources can be loaded from remote resources, since configuration files and schema definitions are addressed by URL.

ZConfig Configuration Files

The syntax for the configuration files used by ZConfig was designed with a target audience of system administrators operating under great pressure. This means the following goals were dominant:

It must be familiar. If the user hasn't seen it before, it won't work.
It must be terse, so making changes is as simple as possible.
It must be easy to read, which is in conflict with terseness.

There are very few redundant characters in the ZConfig syntax, and none that are required for a simple named configuration variable. Those that are present are used to express the hierarhical structure of the configuration file, which changes less often than the specific values for many applications. Since these are very similar what is found in the syntax for other applications, this level of redundancy and familiarity was chosen to ensure readability maintaining terseness for simple value settings.

There are four aspects of the syntax which can be described separately. In all cases, leading and trailing whitespace are ignored, as are blank lines.

Comments

Any line which has a "#" character as the first non-blank character is a comment and is ignored.

Individual configuration settings

A single setting, or key-value pair, is expressed on a single line:

        key value

The key and value are separated by whitespace, and the value may be empty. The characters allowed for the key include alpha-numeric characters, the underscore, period, and hyphen. The first character must be a letter.

Hierarchical sections

Sections may be nested hierarchically to an arbitrary depth. Sections require both "start" and "end" markers in the syntax. Every section has a type described by the schema, and may have a name which may be constrained by the schema.

This is an example section, with "meta" names filled in for each variable component of the syntax:

        <type name>
        </type>

Configuration variables that are part of the section must be placed between these markers:

        <log access>
          # comments can be here
          path /var/log/access.log
        </log>

There is a shortcut syntax for sections which contain no values:

        <type name/>

Directives

The directives to the ZConfig parser are used to support string substitution and resource inclusion::

%define name replacement
Define named replacement text. The namespace for name is case insensitive. replacement may be empty:
```
          %define name
```
Replacement texts are only processed for the value portion of key-value pairs and the definition of additional replacement texts. References to replacement text do not use the Python %-substitution syntax but a more readable $name or ${name}. (To include a literal dollar sign in a value, use $$.)

For example:
```
          %define mydir  /home/fdrake/appstuff
          %define myvar  $mydir/var
          library  $mydir/lib
          datadir  ${myvar}
```
will cause the value for library to be:
```
          /home/fdrake/appstuff/lib
```
and the value for datadir will be:
```
          /home/fdrake/appstuff/var
```
If a reference is made to a replacement that is not defined, an exception will be raised.
%include url
Include an external resource at the current location in the hierarchy. The URL reference may be absolute or relative. If relative, it is combined with the URL for the resource currently being parsed. Inclusions may be deeply nested.

What is a ZConfig Schema?

In general, a "schema" describes what structure and values are allowed for a set of data. For ZConfig, a schema specifies the allowed configuration keys and sections for a given configuration, and what type conversions must be applied to each component value. The behavior of the data binding machinery can be controlled by specifying attribute names and data types.

ZConfig schema are expressed using XML. The schema language is simple, so should not present a hurdle for developers.

A ZConfig configuration file can be expressed in terms of keys, sections, and section types. Section types may be provided by schema components.

keys, multikeys
Defining a simple key (having one value at most) is done using the >key< element:
```
        <key name="key"
             attribute="attribute-name"
             datatype="name-of-datatype-converter"
             default="default-value"
             handler="name-of-handler-function"
             required="yes|no" />
```
The "datatype" attribute describes a conversion routine which both validates and converts the data provided for this key in the configuration file described by the schema. For example, consider this >key< element in a schema:
```
        <key name="akey"
             datatype="integer"
             default="1"/>
```
This element describes a key named "akey" with a datatype of "integer", and a default value of "1". "integer" is the name of a built-in datatype conversion routine which validates and converts configuration file input into an integer value. If this key appears in a configuration file described by the schema, the value will be converted to an integer. For example, assume the config file looked like this:
```
        akey 10
```
The key's value ("10") would be converted to the integer value 10. If however, the value of "akey" in the configuration file was not convertable to an integer:
```
        akey whee!
```
the ZConfig parser would raise an error with the offending configuration file line number, and processing would stop. Since "akey" is not a "required" key, if it is not found in the configuration file, the default ("1") is converted to an integer and it is used as "akey"'s value.

A key which can take multiple values is defined using the >multikey< element:
```
        <multikey name="key"
                  attribute="attribute-name"
                  datatype="name-of-datatype-converter"
                  handler="name-of-handler-function"
                  required="yes|no" >
          <default>first-default</default>
          <default>second-default</default>
          <default>third-default</default>
        </multikey>
```
A multikey is exactly like a key element except it can appear more than once in the same section of a configuration file described by the schema. A multikey's defaults are provided via >default< subelements. A multikey which should have three values by default should be declared with three >default< elements. Each default value is converted using the specified data type. The values for a multikey are stored as a list.

sections, multisections

Defining a section that can at have at most one value is done using the >section< element. Sections are defined much like keys, except sections have a "type" which refers to a section type definition elsewhere in the schema. Sections cannot have defaults:

        <section name="section"
                 type="sectiontype"
                 attribute="attribute-name"
                 handler="name-of-handler-function"
                 required="yes|no">

           ... subkeys and subsections ...

        </section>

A section which can have more than one value is defined using the >multisection< element:

        <multisection name="section"
                      type="sectiontype"
                      attribute="attribute-name"
                      default="default-value"
                      handler="name-of-handler-function"
                      required="yes|no">

           ... subkeys and subsections ...

        </multisection>

section types and abstract section types
A "section type" is referred to by the "type" attribute of >section< and >multisection< elements. Sections use a section type to describe their contents in a similar way that an instance is described as a class. A section type must precede a section declaration which names it in a schema, and its format is:
```
        <sectiontype name="sectiontype"
                     datatype="name-of-datatype"
                     extends="name-of-another-sectiontype"
                     implements="name-of-an-abstract-sectiontype">

            ... subkeys and subsections ...

        </sectiontype>
```
A "sectiontype" element describes a "concrete" section type. "Abstract" section types may also be declared. An abstract section type may be declared and referenced by any number of concrete sectiontypes in order to assert that several section types "implement" (i.e., describe) the same general kind of data. The format for an abstract type declaration is:
```
        <abstracttype name="typename"/>
```
In general, an abstracttype declaration is a marker referenced by "concrete" sectiontype declarations which describe similar data. "Concrete" sectiontypes may reference an abstract sectiontype by using the name of the abstracttype as the value of their "implements" attribute.
schema components
Schema components may be imported into a schema either by the schema itself or by other components. This is done using an >import< element:
```
        <import package="zodb"/>
```
The component is defined by an XML file similar to a schema, but which does not allow keys or sections to be defined at the top level.

Schema components are located by searching sys.path for a specially-named file in a corresponding directory. Schema components will themselves be extensible in the future.

The ZConfig API

Schema and configuration files may be loaded from the local filesystem or from any URL scheme supported by Python's "urllib2" module.

Loading a schema:

        schema = ZConfig.loadSchema(schema_url)

Loading configuration data:

        config, handler = ZConfig.loadConfig(schema, config_url)

Type Conversion Routines

ZConfig uses a type registry to look up datatype conversion routines. A default registry is provided which has conversions for a variety of simple types and can load additional functions if given the Python dotted-name of the conversion function. An alternate registry may be supplied by the application using a lower level of the API.

Type conversion routines may be provided for either simple keys or for sections. For keys, the string value is passed to the conversion function, and the return value is used as the actual value for that key. Default values are converted if they are used.

For sections, an object is passed which makes the values of the contained keys and sections available as attributes; lists of values are supplied for multikeys and multisections.

For multikeys and multisections, each value is converted separately, so the same conversion functions can be used.

The default datatype conversion is no conversion at all (ie. "string"). For sections, the default is to return the value that would be passed to the conversion routine.

There are a number of simple datatype converters "built-in" to the ZConfig schema machinery. Programmers may extend the datatype conversion routines which are available by creating new datatype conversions and referencing them via a Python dotted-name (e.g. "Zope.Startup.datatypes.null_handler"), or by the "shorthand" enabled by a "prefix" attribute of a containing tag:

    <schema prefix="Zope.Startup.datatypes">
      <key name="foo" datatype=".null_handler">
    </schema>

Configuring a Simple Application

Consider an application which has relatively simple configuration requirements, such as a particular usage of the (new-in-Python-2.3) "logging" package. The logging package offers lots of knobs, and users should be able to turn them. In the application, a configuration of a "logger" instance is described, which can be turned on and off, and which can send data to a file or to the Windows event log.

The first step in designing a schema is to determine what an actual configuration file should look like. This is an example of what the file should look like for the simple application:

    logging on

    <logger>
      <file-handler>
        path /home/chrism/var/log.data
      </file-handler>
    </logger>

The top-level "logging" key is a boolean which describes whether logging will be done. The "logger" section which follows describes the "logger" instance. It may have one or more "handler" subsections, each of which describes a particular output channel for log data (limited for purposes of demonstration to a logging.FileHandler or a logging.handlers.NTEventLogHandler instance).

To wire this up to ZConfig using a schema, first create a schema file. The contents of the schema file are as follows:

    <!-- schema outer element, which describes our default datatype
         converter prefix -->

    <schema prefix="ourlogger.datatypes">

      <!-- marker to describe an abstract type -->

      <abstracttype name="loghandler"/>

      <!-- handler sectiontype declarations which reference the
           "loghandler" abstract type  -->

      <sectiontype name="file-handler"
                   datatype=".file_handler"
                   implements="loghandler">
        <key name="path"       required="yes"/>
        <key name="format"     default="------\n%(asctime)s %(message)s"/>
        <key name="dateformat" default="%Y-%m-%dT%H:%M:%S"/>
        <key name="level"      default="info"
                               datatype=".logging_level"/>
      </sectiontype>

      <sectiontype name="nteventlog-handler"
                   datatype=".nteventlog_handler"
                   implements="loghandler">
        <key name="appname"    default="ourapp"/>
        <key name="format"     default="%(message)s"/>
        <key name="dateformat" default="%Y-%m-%dT%H:%M:%S"/>
        <key name="level"      default="info"
                               datatype=".logging_level"/>
      </sectiontype>

      <!-- logger concrete sectiontype declaration -->

      <sectiontype name="logger"
                   datatype=".logger">
         <key name="level"
              datatype=".logging-level"
              default="info"/>
         <multisection name="*"
                       type="loghandler"
                       attribute="handlers"/>
      </sectiontype>

      <!-- our logging key and logger section declaration -->

      <key name="logging"
           datatype="boolean"
           default="on"/>

      <section name="*"
               type="logger"/>

    </schema>

The schema has a outer element which declares a "prefix" attribute. This prefix is used as the default package name in which to look for datatype converters which have a "." as their first character. In this case, the package in which these converters are said to be found is the "ourlogger.datatypes" module. An "ourlogger" package can be created and placed on the PYTHONPATH. Inside it, create a "datatypes" module, which has the following content:

      import logging

      def logger(section):
          logger = logging.getLogger('')
          logger.setLevel(section.level)
          logger.handlers = []
          for handler in section.handlers:
              logger.addHandler(handler)

      def file_handler(section):
          format = section.format
          dateformat = section.dateformat
          level = section.level
          path = section.path
          formatter = logging.Formatter(format, dateformat)
          handler = logging.FileHandler(path)
          handler.setFormatter(formatter)
          handler.setLevel(level)
          return handler

      def nteventlog_handler(section):
          appname = section.appname
          format = section.format
          dateformat = section.dateformat
          level = section.level
          formatter = logging.Formatter(format, dateformat)
          handler = logging.handlers.NTEventLogHandler(appname)
          handler.setFormatter(formatter)
          handler.setLevel(level)
          return handler

      class LogLevelConversion:
          _levels = {
              "fatal": 50,
              "error": 40,
              "warn": 30,
              "info": 20,
              "debug": 10,
              "all": 0,
              }

          def __call__(self, value):
              s = str(value).lower()
              if self._levels.has_key(s):
                  return self._levels[s]
              else:
                  v = int(s)
                  if not (0 <= v <= 50):
                      raise ValueError("log level not in range: " + `v`)
                  return v

      logging_level = LogLevelConversion()

This module defines all of the "non-built-in" datatype conversion functions specified within the schema file ("logger", "file_handler", "nteventlog_handler", and "logging_level"). There is another datatype mentioned in the schema (boolean), but this is a ZConfig built-in converter, so there is no need to write a datatype conversion for it.

Creating the "application glue" which uses this schema and its associated datatype converters does not require much additional work. The application could pass the name of the configuration file to a function like this one, which loads the schema and the configuration file, and then finishes setting up the "logging" module accordingly:

        import ZConfig

        myfile = os.path.abspath(__file__)
        SCHEMA = os.path.join(os.path.dirname(myfile), "schema.xml")

        def configure(config_file_name):
            schema = ZConfig.loadSchema(SCHEMA)
            cfg, nil = ZConfig.loadConfig(schema, config_file_name)
            if not cfg.logging and cfg.logger:
               from logging import getLogger
               logger = getLogger('')
               logger.handlers = []

That's it. The logging environment is configured, given that a proper schema_file_name and config_file_name are passed in. Since the datatype handlers take care of configuring the logger instance with the proper handlers, one only needs to be sure that the "logging" flag is respected and delete any handlers from the logger if they've been added as the result of datatype conversion.

As a result of this work, the application's users can specify a config file that looks like this:

        logging off

and logging will be off.

The users can specify a config file that looks like this:

        logging on

        <logger>
          <file-handler>
            path /home/chrism/var/log.data
          </file-handler>
        </logger>

and logging will be on, with a single logfile written to the path /home/chrism/var/log.data.

The users can specify a config file that looks like this:

        logging on

        <logger>
          <file-handler>
            path /home/chrism/var/log.data
            level info
          </file-handler>
          <nteventlog-handler>
            appname MyApp
            level warn
          </nteventlog-handler>
        </logger>

and logging will be on, with a logfile written including messages of level "info" to the path /home/chrism/var/log.data and messages of the level "warn" to the local NT event log.

The users can specify a config file that looks like this:

        logging on

        <logger>
          <file-handler>
            path /home/chrism/var/log.data
            level info
            format %(message)s
          </file-handler>
        </logger>

and logging will be on, with a logfile written including messages of level "info" to the path /home/chrism/var/log.data with a log format that doesn't include the date or time or any intermediate characters between log records.

Case Study: Using ZConfig to Configure Zope

The Zope application server is a large application, mostly written in Python. Since Zope is more of a framework than an application, it has many "knobs". Most of these knobs can be turned from within the Zope Management Interface, a web-based UI that provides Zope users and developers a mechanism to interact with the objects that comprise their applications.

Some knobs cannot be turned via this UI, particularly those having to do with server configuration and behavior and other system-global settings. Historically, these configuration parameters were tunable via the use of environment variables. Zope 2.6 has 41 individual environment variables that are used to specify runtime configuration parameters.

For Zope 2.7, we have allowed these configuration parameters to be specified within a configuration file using ZConfig. Zope makes heavy use of ZConfig schemas to perform its configuration duties. Additionally, some of ZConfig's design is influenced by the makeup of Zope; however, nothing in ZConfig depends on Zope. ZConfig can be used anywhere that you run Python 2.2.

Simple ZConfig Uses In The Zope Configuration Schema

In Zope, there is a master schema within the Zope.Startup package named zopeschema.xml.

Within this schema file, Zope makes use of "simple" ZConfig keys as global configuration parameters. For example, the Zope schema allows the specification of an instancehome parameter which is the place on the filesystem which comprises the directory structures that makes up a single Zope "instance", and a clienthome parameter, which is the place on the filesystem in which variable data files are stored.

The relevant schema portions that define these simple keys are:

      <key name="instancehome" datatype="existing-directory"
            required="yes">
        <description>
          The top-level directory which contains the "instance" of the
          application server.
        </description>
      </key>

      <key name="clienthome" datatype="existing-directory">
        <description>
          The directory used to store the file-storage used to back the
          ZODB database by default, as well as other files used by the
          Zope application server to control the run-time behavior.
        </description>
        <metadefault>$INSTANCE_HOME/var</metadefault>
      </key>

The datatype existing-directory is a standard ZConfig datatype. It is backed by the piece of code, defined within the ZConfig.datatypes module:

      def existing_directory(v):
          if os.path.isdir(v):
              return v
          raise ValueError, '%s is not an existing directory' % v

As you can see, if the user specifies a directory which doesn't exist at the time of configuration parsing for either the clienthome parameter or the instancehome parameter, the datatype handler will raise a ValueError, preventing configuration from completing.

Zope may run under several "security policy" implementations. One security policy implementation is "Python", the other is "C" (one is implemented in Python, the other in C). The key that this feature relies on is a custom-defined one:

       <key name="security-policy-implementation"
            datatype=".security_policy_implementation"
            default="C"/>

Note however, that unlike the last example, this key also defines a "custom" datatype (as indicated by the dot before the datatype name. In this case, the datatype handler is defined within the Zope.Startup.datatypes module (this is defined within the prefix of the schema definition itself), and it looks like this:

        def security_policy_implementation(value):
            value = value.upper()
            ok = ('PYTHON', 'C')
            if value not in ok:
                raise ValueError, (
                    "security_policy_implementation must be one of %s" % ok
                    )
            return value

We can see that ZConfig is flexible enough to let us define our own simple datatypes for use within our schemas.

The Use of Schema Components in Zope

Zope consists of about 50 or so Python packages, some of which are usable outside the Zope framework. It was decided that the configuration parameters for Zope should reflect the "sum of its parts". This meant that the definition of Zope's possible configuration parameters could not be monolithic, but needed to be distributed across its various packages. Though it was decentralized enough, the "41 environment variable" approach was becoming unworkable because it was very ad-hoc and there was a lack of understanding on the part of sysadmins (who are very used to interpreting configuration files, but not very used to spelunking scattered docs about environment variables) about how to tell Zope to configure itself in some particular way. This was because the documentation was as decentralized and ad-hoc as the code itself. Thus, we decided to make ZConfig schemas extensible enough to allow the inclusion of schema components on a per-package basis from a master schema.

This zopeschema.xml schema imports the type definitions defined within packages included within Zope itself. For example, the master schema XML file includes the type definitions exported by the zLOG package (the legacy Zope logging package), the ZODB package (the Zope Object Database package), and the ZServer package (the package that allows Zope to run network servers):

        <import package="zLOG"/>
        <import package="ZODB"/>
        <import package="ZServer"/>

These three schema statements instruct the Zope master schema to find files named component.xml within the zLOG, ZODB, and ZServer packages respectively and load them into the current schema namespace.

Each schema component defines abstract types and section types for the component which it represents. For example, the ZODB schema component defines abstract types for storage and database sections. Additionally, it defines concrete sectiontypes for different kinds of storages and databases. Here is an elided excerpt from the ZODB schema component.xml which defines an abstract storage type and a concrete sectiontype which can be used to define an actual storage section:

        <component prefix="ZODB.config">
           <abstracttype name="storage"/>
           ... other abstract types defined here ...
           <sectiontype name="filestorage" datatype=".FileStorage"
                        implements="storage">
              <key name="path" required="yes">
                <description>
                  Path name to the main storage file.  The names for
                  supplemental files, including index and lock files, will be
                  computed from this.
                </description>
              </key>
              <key name="create" datatype="boolean" default="false">
                <description>
                  Flag that indicates whether the storage should be truncated
                  if it already exists.
                </description>
              </key>
              <key name="read-only" datatype="boolean" default="false">
                <description>
                  If true, only reads may be executed against the storage. Note
                  that the "pack" operation is not considered a write operation
                  and is still allowed on a read-only filestorage.
                </description>
              </key>
              <key name="quota" datatype="byte-size">
                <description>
                  Maximum allowed size of the storage file.  Operations which
                  would cause the size of the storage to exceed the quota will
                  result in a ZODB.FileStorage.FileStorageQuotaError being
                  raised.
                </description>
              </key>
           </sectiontype>
            .... other concrete types which implement the defined
            abstract types defined here ...
        </component>

The zLOG and ZServer packages provide similar component.xml definitions for the abstract types and section types that comprise their settings.

How Zope Configures Itself

The set of steps that Zope takes to configure itself based on values from a ZConfig configuration file and command-line parameters is straightforward.

A stub shell script named runzope has the path to a ZConfig configuration file embedded within it (as a result of a instance home installation).
It calls a Python script named runzope.py, which is the frontend for starting Zope, passing it the location of the configuration file and any command-line-specified configuration parameters (which will override the configuration-file supplied parameters).
The runzope Python script creates a ZopeOptions instance which holds both the command-line parameters and the config file location.
The ZopeOptions instance parses the configuration file using ZConfig using the zopeschema.xml schema to determine the non-overridden (file-supplied) values for the current configuration.
The resulting configuration object (a simple namespace) is passed to a routine which starts Zope. The routine uses the configuration object to perform actions during startup.

Summary

ZConfig is a powerful and general tool that may be used to supply type-checked runtime configuration parameters to most Python applications.

Approximate time: 30 minutes.