PEP: 227
Title: Statically Nested Scopes
Version: $Revision: 2150 $
Author: Jeremy Hylton <jeremy at zope.com>
Status: Draft
Type: Standards Track
Python-Version: 2.1
Created: 01-Nov-2000
Post-History: 

Abstract

    This PEP describes the addition of statically nested scoping
    (lexical scoping) for Python 2.2, and as an source level option
    for python 2.1.  In addition, Python 2.1 will issue warnings about
    constructs whose meaning may change when this feature is enabled.

    The old language definition (2.0 and before) defines exactly three
    namespaces that are used to resolve names -- the local, global,
    and built-in namespaces.  The addition of nested scopes allows
    resolution of unbound local names in enclosing functions'
    namespaces.

    The most visible consequence of this change is that lambdas (and
    other nested functions) can reference variables defined in the
    surrounding namespace.  Currently, lambdas must often use default
    arguments to explicitly creating bindings in the lambda's
    namespace.

Introduction

    This proposal changes the rules for resolving free variables in
    Python functions.  The new name resolution semantics will take
    effect with Python 2.2.  These semantics will also be available in
    Python 2.1 by adding "from __future__ import nested_scopes" to the
    top of a module.  (See PEP 236.)

    The Python 2.0 definition specifies exactly three namespaces to
    check for each name -- the local namespace, the global namespace,
    and the builtin namespace.  According to this definition, if a
    function A is defined within a function B, the names bound in B
    are not visible in A.  The proposal changes the rules so that
    names bound in B are visible in A (unless A contains a name
    binding that hides the binding in B).

    This specification introduces rules for lexical scoping that are
    common in Algol-like languages.  The combination of lexical
    scoping and existing support for first-class functions is
    reminiscent of Scheme.

    The changed scoping rules address two problems -- the limited
    utility of lambda expressions (and nested functions in general),
    and the frequent confusion of new users familiar with other
    languages that support nested lexical scopes, e.g. the inability
    to define recursive functions except at the module level.

    The lambda expression yields an unnamed function that evaluates a
    single expression.  It is often used for callback functions.  In
    the example below (written using the Python 2.0 rules), any name
    used in the body of the lambda must be explicitly passed as a
    default argument to the lambda.

      from Tkinter import *
      root = Tk()
      Button(root, text="Click here",
             command=lambda root=root: root.test.configure(text="..."))

    This approach is cumbersome, particularly when there are several
    names used in the body of the lambda.  The long list of default
    arguments obscures the purpose of the code.  The proposed
    solution, in crude terms, implements the default argument approach
    automatically.  The "root=root" argument can be omitted.

    The new name resolution semantics will cause some programs to
    behave differently than they did under Python 2.0.  In some cases,
    programs will fail to compile.  In other cases, names that were
    previously resolved using the global namespace will be resolved
    using the local namespace of an enclosing function.  In Python
    2.1, warnings will be issued for all statements that will behave
    differently.

Specification

    Python is a statically scoped language with block structure, in
    the traditional of Algol.  A code block or region, such as a
    module, class definition, or function body, is the basic unit of a
    program.

    Names refer to objects.  Names are introduced by name binding
    operations.  Each occurrence of a name in the program text refers
    to the binding of that name established in the innermost function
    block containing the use.

    The name binding operations are argument declaration, assignment,
    class and function definition, import statements, for statements,
    and except clauses.  Each name binding occurs within a block
    defined by a class or function definition or at the module level
    (the top-level code block).

    If a name is bound anywhere within a code block, all uses of the
    name within the block are treated as references to the current
    block.  (Note: This can lead to errors when a name is used within
    a block before it is bound.)

    If the global statement occurs within a block, all uses of the
    name specified in the statement refer to the binding of that name
    in the top-level namespace.  Names are resolved in the top-level
    namespace by searching the global namespace, i.e. the namespace of
    the module containing the code block, and in the builtin
    namespace, i.e. the namespace of the __builtin__ module.  The
    global namespace is searched first.  If the name is not found
    there, the builtin namespace is searched.  The global statement
    must precede all uses of the name.

    If a name is used within a code block, but it is not bound there
    and is not declared global, the use is treated as a reference to
    the nearest enclosing function region.  (Note: If a region is
    contained within a class definition, the name bindings that occur
    in the class block are not visible to enclosed functions.)

    A class definition is an executable statement that may contain
    uses and definitions of names.  These references follow the normal
    rules for name resolution.  The namespace of the class definition
    becomes the attribute dictionary of the class.

    The following operations are name binding operations.  If they
    occur within a block, they introduce new local names in the
    current block unless there is also a global declaration.

    Function definition: def name ...
    Argument declaration: def f(...name...), lambda ...name...
    Class definition: class name ...
    Assignment statement: name = ...    
    Import statement: import name, import module as name,
        from module import name
    Implicit assignment: names are bound by for statements and except
        clauses

    There are several cases where Python statements are illegal when
    used in conjunction with nested scopes that contain free
    variables.

    If a variable is referenced in an enclosed scope, it is an error
    to delete the name.  The compiler will raise a SyntaxError for
    'del name'.

    If the wild card form of import (import *) is used in a function
    and the function contains a nested block with free variables, the
    compiler will raise a SyntaxError.

    If exec is used in a function and the function contains a nested
    block with free variables, the compiler will raise a SyntaxError
    unless the exec explicitly specifies the local namespace for the
    exec.  (In other words, "exec obj" would be illegal, but 
    "exec obj in ns" would be legal.)

    If a name bound in a function scope is also the name of a module
    global name or a standard builtin name, and the function contains
    a nested function scope that references the name, the compiler
    will issue a warning.  The name resolution rules will result in
    different bindings under Python 2.0 than under Python 2.2.  The
    warning indicates that the program may not run correctly with all
    versions of Python.

Discussion

    The specified rules allow names defined in a function to be
    referenced in any nested function defined with that function.  The
    name resolution rules are typical for statically scoped languages,
    with three primary exceptions:

        - Names in class scope are not accessible.
        - The global statement short-circuits the normal rules.
        - Variables are not declared.

    Names in class scope are not accessible.  Names are resolved in
    the innermost enclosing function scope.  If a class definition
    occurs in a chain of nested scopes, the resolution process skips
    class definitions.  This rule prevents odd interactions between
    class attributes and local variable access.  If a name binding
    operation occurs in a class definition, it creates an attribute on
    the resulting class object.  To access this variable in a method,
    or in a function nested within a method, an attribute reference
    must be used, either via self or via the class name.

    An alternative would have been to allow name binding in class
    scope to behave exactly like name binding in function scope.  This
    rule would allow class attributes to be referenced either via
    attribute reference or simple name.  This option was ruled out
    because it would have been inconsistent with all other forms of
    class and instance attribute access, which always use attribute
    references.  Code that used simple names would have been obscure.

    The global statement short-circuits the normal rules.  Under the
    proposal, the global statement has exactly the same effect that it
    does for Python 2.0.  It is also noteworthy because it allows name
    binding operations performed in one block to change bindings in
    another block (the module).

    Variables are not declared.  If a name binding operation occurs
    anywhere in a function, then that name is treated as local to the
    function and all references refer to the local binding.  If a
    reference occurs before the name is bound, a NameError is raised.
    The only kind of declaration is the global statement, which allows
    programs to be written using mutable global variables.  As a
    consequence, it is not possible to rebind a name defined in an
    enclosing scope.  An assignment operation can only bind a name in
    the current scope or in the global scope.  The lack of
    declarations and the inability to rebind names in enclosing scopes
    are unusual for lexically scoped languages; there is typically a
    mechanism to create name bindings (e.g. lambda and let in Scheme)
    and a mechanism to change the bindings (set! in Scheme).

    XXX Alex Martelli suggests comparison with Java, which does not
    allow name bindings to hide earlier bindings.  

Examples

    A few examples are included to illustrate the way the rules work.

    XXX Explain the examples

    >>> def make_adder(base):
    ...     def adder(x):
    ...         return base + x
    ...     return adder
    >>> add5 = make_adder(5)
    >>> add5(6)
    11

    >>> def make_fact():
    ...     def fact(n):
    ...         if n == 1:
    ...             return 1L
    ...         else:
    ...             return n * fact(n - 1)
    ...     return fact
    >>> fact = make_fact()
    >>> fact(7)    
    5040L

    >>> def make_wrapper(obj):
    ...     class Wrapper:
    ...         def __getattr__(self, attr):
    ...             if attr[0] != '_':
    ...                 return getattr(obj, attr)
    ...             else:
    ...                 raise AttributeError, attr
    ...     return Wrapper()
    >>> class Test:
    ...     public = 2
    ...     _private = 3
    >>> w = make_wrapper(Test())
    >>> w.public
    2
    >>> w._private
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    AttributeError: _private

    An example from Tim Peters demonstrates the potential pitfalls of
    nested scopes in the absence of declarations:

    i = 6
    def f(x):
        def g():
            print i
        # ...
        # skip to the next page
        # ...
        for i in x:  # ah, i *is* local to f, so this is what g sees
            pass
        g()

    The call to g() will refer to the variable i bound in f() by the for
    loop.  If g() is called before the loop is executed, a NameError will
    be raised.

    XXX need some counterexamples

Backwards compatibility

    There are two kinds of compatibility problems caused by nested
    scopes.  In one case, code that behaved one way in earlier
    versions behaves differently because of nested scopes.  In the
    other cases, certain constructs interact badly with nested scopes
    and will trigger SyntaxErrors at compile time.

    The following example from Skip Montanaro illustrates the first
    kind of problem:

    x = 1
    def f1():
        x = 2
        def inner():
            print x
        inner()

    Under the Python 2.0 rules, the print statement inside inner()
    refers to the global variable x and will print 1 if f1() is
    called.  Under the new rules, it refers to the f1()'s namespace,
    the nearest enclosing scope with a binding.

    The problem occurs only when a global variable and a local
    variable share the same name and a nested function uses that name
    to refer to the global variable.  This is poor programming
    practice, because readers will easily confuse the two different
    variables.  One example of this problem was found in the Python
    standard library during the implementation of nested scopes.

    To address this problem, which is unlikely to occur often, the
    Python 2.1 compiler (when nested scopes are not enabled) issues a
    warning.

    The other compatibility problem is caused by the use of 'import *'
    and 'exec' in a function body, when that function contains a
    nested scope and the contained scope has free variables.  For
    example:

    y = 1
    def f():
        exec "y = 'gotcha'" # or from module import *
        def g():
            return y
        ...

    At compile-time, the compiler cannot tell whether an exec that
    operates on the local namespace or an import * will introduce
    name bindings that shadow the global y.  Thus, it is not possible
    to tell whether the reference to y in g() should refer to the
    global or to a local name in f().

    In discussion of the python-list, people argued for both possible
    interpretations.  On the one hand, some thought that the reference
    in g() should be bound to a local y if one exists.  One problem
    with this interpretation is that it is impossible for a human
    reader of the code to determine the binding of y by local
    inspection.  It seems likely to introduce subtle bugs.  The other
    interpretation is to treat exec and import * as dynamic features
    that do not effect static scoping.  Under this interpretation, the
    exec and import * would introduce local names, but those names
    would never be visible to nested scopes.  In the specific example
    above, the code would behave exactly as it did in earlier versions
    of Python.

    Since each interpretation is problematic and the exact meaning
    ambiguous, the compiler raises an exception.  The Python 2.1
    compiler issues a warning when nested scopes are not enabled.

    A brief review of three Python projects (the standard library,
    Zope, and a beta version of PyXPCOM) found four backwards
    compatibility issues in approximately 200,000 lines of code.
    There was one example of case #1 (subtle behavior change) and two
    examples of import * problems in the standard library.

    (The interpretation of the import * and exec restriction that was
    implemented in Python 2.1a2 was much more restrictive, based on
    language that in the reference manual that had never been
    enforced.  These restrictions were relaxed following the release.)

Compatibility of C API

    The implementation causes several Python C API functions to
    change, including PyCode_New().  As a result, C extensions may
    need to be updated to work correctly with Python 2.1.  

locals() / vars()

    These functions return a dictionary containing the current scope's
    local variables.  Modifications to the dictionary do not affect
    the values of variables.  Under the current rules, the use of
    locals() and globals() allows the program to gain access to all
    the namespaces in which names are resolved.

    An analogous function will not be provided for nested scopes.
    Under this proposal, it will not be possible to gain
    dictionary-style access to all visible scopes.

Warnings and Errors

    The compiler will issue warnings in Python 2.1 to help identify
    programs that may not compile or run correctly under future
    versions of Python.  Under Python 2.2 or Python 2.1 if the
    nested_scopes future statement is used, which are collectively
    referred to as "future semantics" in this section, the compiler
    will issue SyntaxErrors in some cases.

    The warnings typically apply when a function that contains a
    nested function that has free variables.  For example, if function
    F contains a function G and G uses the builtin len(), then F is a
    function that contains a nested function (G) with a free variable
    (len).  The label "free-in-nested" will be used to describe these
    functions. 

    import * used in function scope

        The language reference specifies that import * may only occur
        in a module scope.  (Sec. 6.11)  The implementation of C
        Python has supported import * at the function scope.

        If import * is used in the body of a free-in-nested function,
        the compiler will issue a warning.  Under future semantics,
        the compiler will raise a SyntaxError.

    bare exec in function scope

        The exec statement allows two optional expressions following
        the keyword "in" that specify the namespaces used for locals
        and globals.  An exec statement that omits both of these
        namespaces is a bare exec.

        If a bare exec is used in the body of a free-in-nested
        function, the compiler will issue a warning.  Under future
        semantics, the compiler will raise a SyntaxError.

    local shadows global

        If a free-in-nested function has a binding for a local
        variable that (1) is used in a nested function and (2) is the
        same as a global variable, the compiler will issue a warning.

Rebinding names in enclosing scopes

    There are technical issues that make it difficult to support
    rebinding of names in enclosing scopes, but the primary reason
    that it is not allowed in the current proposal is that Guido is
    opposed to it.  His motivation: it is difficult to support,
    because it would require a new mechanism that would allow the
    programmer to specify that an assignment in a block is supposed to
    rebind the name in an enclosing block; presumably a keyword or
    special syntax (x := 3) would make this possible.  Given that this
    would encourage the use of local variables to hold state that is
    better stored in a class instance, it's not worth adding new
    syntax to make this possible (in Guido's opinion).

    The proposed rules allow programmers to achieve the effect of
    rebinding, albeit awkwardly.  The name that will be effectively
    rebound by enclosed functions is bound to a container object.  In
    place of assignment, the program uses modification of the
    container to achieve the desired effect:

    def bank_account(initial_balance):
        balance = [initial_balance]
        def deposit(amount):
            balance[0] = balance[0] + amount
            return balance
        def withdraw(amount):
            balance[0] = balance[0] - amount
            return balance
        return deposit, withdraw

    Support for rebinding in nested scopes would make this code
    clearer.  A class that defines deposit() and withdraw() methods
    and the balance as an instance variable would be clearer still.
    Since classes seem to achieve the same effect in a more
    straightforward manner, they are preferred.

Implementation

    XXX Jeremy, is this still the case?

    The implementation for C Python uses flat closures [1].  Each def
    or lambda expression that is executed will create a closure if the
    body of the function or any contained function has free
    variables.  Using flat closures, the creation of closures is
    somewhat expensive but lookup is cheap.

    The implementation adds several new opcodes and two new kinds of
    names in code objects.  A variable can be either a cell variable
    or a free variable for a particular code object.  A cell variable
    is referenced by containing scopes; as a result, the function
    where it is defined must allocate separate storage for it on each
    invocation.  A free variable is referenced via a function's
    closure. 

    The choice of free closures was made based on three factors.
    First, nested functions are presumed to be used infrequently,
    deeply nested (several levels of nesting) still less frequently.
    Second, lookup of names in a nested scope should be fast.
    Third, the use of nested scopes, particularly where a function
    that access an enclosing scope is returned, should not prevent
    unreferenced objects from being reclaimed by the garbage
    collector. 

    XXX Much more to say here

References

    [1] Luca Cardelli.  Compiling a functional language.  In Proc. of
    the 1984 ACM Conference on Lisp and Functional Programming,
    pp. 208-217, Aug. 1984
        http://citeseer.ist.psu.edu/cardelli84compiling.html

Copyright

    XXX