PEP: 292
Title: Simpler String Substitutions
Version: $Revision: 1965 $
Last-Modified: $Date: 2005-01-29 10:24:59 -0800 (Sat, 29 Jan 2005) $
Author: Barry A. Warsaw <barry at python.org>
Status: Final
Type: Standards Track
Created: 18-Jun-2002
Python-Version: 2.4
Post-History: 18-Jun-2002, 23-Mar-2004, 22-Aug-2004

Abstract

    This PEP describes a simpler string substitution feature, also
    known as string interpolation.  This PEP is "simpler" in two
    respects:

    1. Python's current string substitution feature
       (i.e. %-substitution) is complicated and error prone.  This PEP
       is simpler at the cost of some expressiveness.

    2. PEP 215 proposed an alternative string interpolation feature,
       introducing a new `$' string prefix.  PEP 292 is simpler than
       this because it involves no syntax changes and has much simpler
       rules for what substitutions can occur in the string.


Rationale

    Python currently supports a string substitution syntax based on
    C's printf() '%' formatting character[1].  While quite rich,
    %-formatting codes are also error prone, even for
    experienced Python programmers.  A common mistake is to leave off
    the trailing format character, e.g. the `s' in "%(name)s".

    In addition, the rules for what can follow a % sign are fairly
    complex, while the usual application rarely needs such complexity.
    Most scripts need to do some string interpolation, but most of
    those use simple `stringification' formats, i.e. %s or %(name)s
    This form should be made simpler and less error prone.


A Simpler Proposal

    We propose the addition of a new class, called 'Template', which
    will live in the string module.  The Template class supports new
    rules for string substitution; its value contains placeholders,
    introduced with the $ character.  The following rules for
    $-placeholders apply:

    1. $$ is an escape; it is replaced with a single $

    2. $identifier names a substitution placeholder matching a mapping
       key of "identifier".  By default, "identifier" must spell a
       Python identifier as defined in [2].  The first non-identifier
       character after the $ character terminates this placeholder
       specification.

    3. ${identifier} is equivalent to $identifier.  It is required
       when valid identifier characters follow the placeholder but are
       not part of the placeholder, e.g. "${noun}ification".

    If the $ character appears at the end of the line, or is followed
    by any other character than those described above, a ValueError
    will be raised at interpolation time.  Values in mapping are
    converted automatically to strings.

    No other characters have special meaning, however it is possible
    to derive from the Template class to define different substitution
    rules.  For example, a derived class could allow for periods in
    the placeholder (e.g. to support a kind of dynamic namespace and
    attribute path lookup), or could define a delimiter character
    other than '$'.

    Once the Template has been created, substitutions can be performed
    by calling one of two methods:

    - substitute().  This method returns a new string which results
      when the values of a mapping are substituted for the
      placeholders in the Template.  If there are placeholders which
      are not present in the mapping, a KeyError will be raised.

    - safe_substitute().  This is similar to the substitute() method,
      except that KeyErrors are never raised (due to placeholders
      missing from the mapping).  When a placeholder is missing, the
      original placeholder will appear in the resulting string.

   Here are some examples:

        >>> from string import Template
        >>> s = Template('${name} was born in ${country}')
        >>> print s.substitute(name='Guido', country='the Netherlands')
        Guido was born in the Netherlands
        >>> print s.substitute(name='Guido')
        Traceback (most recent call last):
        [...]
        KeyError: 'country'
        >>> print s.safe_substitute(name='Guido')
        Guido was born in ${country}

    The signature of substitute() and safe_substitute() allows for
    passing the mapping of placeholders to values, either as a single
    dictionary-like object in the first positional argument, or as
    keyword arguments as shown above.  The exact details and
    signatures of these two methods is reserved for the standard
    library documentation.


Why `$' and Braces?

    The BDFL said it best[4]: "The $ means "substitution" in so many
    languages besides Perl that I wonder where you've been. [...]
    We're copying this from the shell."

    Thus the substitution rules are chosen because of the similarity
    with so many other languages.  This makes the substitution rules
    easier to teach, learn, and remember.


Comparison to PEP 215

    PEP 215 describes an alternate proposal for string interpolation.
    Unlike that PEP, this one does not propose any new syntax for
    Python.  All the proposed new features are embodied in a new
    library module.  PEP 215 proposes a new string prefix
    representation such as $"" which signal to Python that a new type
    of string is present.  $-strings would have to interact with the
    existing r-prefixes and u-prefixes, essentially doubling the
    number of string prefix combinations.

    PEP 215 also allows for arbitrary Python expressions inside the
    $-strings, so that you could do things like:

        import sys
        print $"sys = $sys, sys = $sys.modules['sys']"

    which would return

        sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>

    It's generally accepted that the rules in PEP 215 are safe in the
    sense that they introduce no new security issues (see PEP 215,
    "Security Issues" for details).  However, the rules are still
    quite complex, and make it more difficult to see the substitution
    placeholder in the original $-string.

    The interesting thing is that the Template class defined in this
    PEP is designed for inheritance and, with a little extra work,
    it's possible to support PEP 215's functionality using existing
    Python syntax.

    For example, one could define subclasses of Template and dict that
    allowed for a more complex placeholder syntax and a mapping that
    evaluated those placeholders.


Internationalization

    The implementation supports internationalization by recording the
    original template string in the Template instance's 'template'
    attribute.  This attribute would serve as the lookup key in an
    gettext-based catalog.  It is up to the application to turn the
    resulting string back into a Template for substitution.

    However, the Template class was designed to work more intuitively
    in an internationalized application, by supporting the mixing-in
    of Template and unicode subclasses.  Thus an internationalized
    application could create an application-specific subclass,
    multiply inheriting from Template and unicode, and using instances
    of that subclass as the gettext catalog key.  Further, the
    subclass could alias the special __mod__() method to either
    .substitute() or .safe_substitute() to provide a more traditional
    string/unicode like %-operator substitution syntax.


Reference Implementation

    The implementation has been committed to the Python 2.4 source tree.


References

    [1] String Formatting Operations
        http://www.python.org/doc/current/lib/typesseq-strings.html

    [2] Identifiers and Keywords
        http://www.python.org/doc/current/ref/identifiers.html

    [3] Guido's python-dev posting from 21-Jul-2002
        http://mail.python.org/pipermail/python-dev/2002-July/026397.html

    [4] http://mail.python.org/pipermail/python-dev/2002-June/025652.html

    [5] Reference Implementation
        http://sourceforge.net/tracker/index.php?func=detail&aid=1014055&group_id=5470&atid=305470

Copyright

    This document has been placed in the public domain.