PEP: 269
Title: Pgen Module for Python
Version: $Revision: 1865 $
Last-Modified: $Date: 2004-08-18 04:56:16 -0700 (Wed, 18 Aug 2004) $
Author: Jonathan Riehl <jriehl at spaceship.com>
Status: Deferred
Type: Standards Track
Created: 24-Aug-2001
Python-Version: 2.2
Post-History: 

Abstract

    Much like the parser module exposes the Python parser, this PEP
    proposes that the parser generator used to create the Python
    parser, pgen, be exposed as a module in Python.


Rationale

    Through the course of Pythonic history, there have been numerous
    discussions about the creation of a Python compiler [1].  These
    have resulted in several implementations of Python parsers, most
    notably the parser module currently provided in the Python
    standard library[2] and Jeremy Hylton's compiler module[3].
    However, while multiple language changes have been proposed
    [4][5], experimentation with the Python syntax has lacked the
    benefit of a Python binding to the actual parser generator used to
    build Python.

    By providing a Python wrapper analogous to Fred Drake Jr.'s parser
    wrapper, but targeted at the pgen library, the following
    assertions are made:

    1. Reference implementations of syntax changes will be easier to
       develop.  Currently, a reference implementation of a syntax
       change would require the developer to use the pgen tool from
       the command line.  The resulting parser data structure would
       then either have to be reworked to interface with a custom
       CPython implementation, or wrapped as a C extension module.

    2. Reference implementations of syntax changes will be easier to
       distribute.  Since the parser generator will be available in
       Python, it should follow that the resulting parser will
       accessible from Python.  Therefore, reference implementations
       should be available as pure Python code, versus using custom
       versions of the existing CPython distribution, or as compilable
       extension modules.

    3. Reference implementations of syntax changes will be easier to
       discuss with a larger audience.  This somewhat falls out of the
       second assertion, since the community of Python users is most
       likely larger than the community of CPython developers.

    4. Development of small languages in Python will be further
       enhanced, since the additional module will be a fully
       functional LL(1) parser generator.


Specification

    The proposed module will be called pgen.  The pgen module will
    contain the following functions:

    parseGrammarFile (fileName) -> AST
        The parseGrammarFile() function will read the file pointed to
        by fileName and create an AST object.  The AST nodes will
        contain the nonterminal, numeric values of the parser
        generator meta-grammar.  The output AST will be an instance of
        the AST extension class as provided by the parser module.
        Syntax errors in the input file will cause the SyntaxError
        exception to be raised.

    parseGrammarString (text) -> AST
        The parseGrammarString() function will follow the semantics of
        the parseGrammarFile(), but accept the grammar text as a
        string for input, as opposed to the file name.

    buildParser (grammarAst) -> DFA
        The buildParser() function will accept an AST object for input
        and return a DFA (deterministic finite automaton) data
        structure.  The DFA data structure will be a C extension
        class, much like the AST structure is provided in the parser
        module.  If the input AST does not conform to the nonterminal
        codes defined for the pgen meta-grammar, buildParser() will
        throw a ValueError exception.

    parseFile (fileName, dfa, start) -> AST
        The parseFile() function will essentially be a wrapper for the
        PyParser_ParseFile() C API function.  The wrapper code will
        accept the DFA C extension class, and the file name.  An AST
        instance that conforms to the lexical values in the token
        module and the nonterminal values contained in the DFA will be
        output.

    parseString (text, dfa, start) -> AST
        The parseString() function will operate in a similar fashion
        to the parseFile() function, but accept the parse text as an
        argument.  Much like parseFile() will wrap the
        PyParser_ParseFile() C API function, parseString() will wrap
        the PyParser_ParseString() function.

    symbolToStringMap (dfa) -> dict
        The symbolToStringMap() function will accept a DFA instance
        and return a dictionary object that maps from the DFA's
        numeric values for its nonterminals to the string names of the
        nonterminals as found in the original grammar specification
        for the DFA.

    stringToSymbolMap (dfa) -> dict
        The stringToSymbolMap() function output a dictionary mapping
        the nonterminal names of the input DFA to their corresponding
        numeric values.

    Extra credit will be awarded if the map generation functions and
    parsing functions are also methods of the DFA extension class.


Implementation Plan

    A cunning plan has been devised to accomplish this enhancement:

    1. Rename the pgen functions to conform to the CPython naming
       standards.  This action may involve adding some header files to
       the Include subdirectory.

    2. Move the pgen C modules in the Makefile.pre.in from unique pgen
       elements to the Python C library.

    3. Make any needed changes to the parser module so the AST
       extension class understands that there are AST types it may not
       understand.  Cursory examination of the AST extension class
       shows that it keeps track of whether the tree is a suite or an
       expression.

    3. Code an additional C module in the Modules directory.  The C
       extension module will implement the DFA extension class and the
       functions outlined in the previous section.

    4. Add the new module to the build process.  Black magic, indeed.


Limitations

    Under this proposal, would be designers of Python 3000 will still
    be constrained to Python's lexical conventions.  The addition,
    subtraction or modification of the Python lexer is outside the
    scope of this PEP.


Reference Implementation

    No reference implementation is currently provided. A patch
    was provided at some point in
    http://sourceforge.net/tracker/index.php?func=detail&aid=599331&group_id=5470&atid=305470
    but that patch is no longer maintained.


References

    [1] The (defunct) Python Compiler-SIG
        http://www.python.org/sigs/compiler-sig/

    [2] Parser Module Documentation
        http://www.python.org/doc/current/lib/module-parser.html

    [3] Hylton, Jeremy.
        http://www.python.org/doc/current/lib/compiler.html

    [4] Pelletier, Michel. "Python Interface Syntax", PEP-245.
        http://www.python.org/peps/pep-0245.html

    [5] The Python Types-SIG
        http://www.python.org/sigs/types-sig/


Copyright

    This document has been placed in the public domain.