PEP 213 -- Attribute Access Handlers

PEP:	213
Title:	Attribute Access Handlers
Version:	$Revision: 928 $
Author:	Paul Prescod <paul at prescod.net>
Status:	Draft
Type:	Standards Track
Python-Version:	2.1
Created:	21-Jul-2000
Post-History:

Introduction

     It is possible (and even relatively common) in Python code and 
     in extension modules to "trap" when an instance's client code 
     attempts to set an attribute and execute code instead. In other 
     words it is possible to allow users to use attribute assignment/
     retrieval/deletion syntax even though the underlying implementation 
     is doing some computation rather than directly modifying a
     binding.

     This PEP describes a feature that makes it easier, more efficient
     and safer to implement these handlers for Python instances.

Justification

    cenario 1:

        You have a deployed class that works on an attribute named
        "stdout". After a while, you think it would be better to
        check that stdout is really an object with a "write" method
        at the moment of assignment. Rather than change to a
        setstdout method (which would be incompatible with deployed
        code) you would rather trap the assignment and check the
        object's type.

    Scenario 2:

        You want to be as compatible as possible with an object 
        model that has a concept of attribute assignment. It could
        be the W3C Document Object Model or a particular COM 
        interface (e.g. the PowerPoint interface). In that case
        you may well want attributes in the model to show up as
        attributes in the Python interface, even though the 
        underlying implementation may not use attributes at all.

    Scenario 3:

        A user wants to make an attribute read-only.

    In short, this feature allows programmers to separate the 
    interface of their module from the underlying implementation
    for whatever purpose. Again, this is not a new feature but
    merely a new syntax for an existing convention.

Current Solution

    To make some attributes read-only:

    class foo:
       def __setattr__( self, name, val ):
          if name=="readonlyattr":
             raise TypeError
          elif name=="readonlyattr2":
             raise TypeError
          ...
          else:
             self.__dict__["name"]=val

     This has the following problems:

     1. The creator of the method must be intimately aware of whether
        somewhere else in the class hiearchy __setattr__ has also been
        trapped for any particular purpose. If so, she must specifically
        call that method rather than assigning to the dictionary. There
        are many different reasons to overload __setattr__ so there is a
        decent potential for clashes. For instance object database
        implementations often overload setattr for an entirely unrelated
        purpose.

     2. The string-based switch statement forces all attribute handlers 
        to be specified in one place in the code. They may then dispatch
        to task-specific methods (for modularity) but this could cause
        performance problems.

     3. Logic for the setting, getting and deleting must live in 
        __getattr__, __setattr__ and __delattr__. Once again, this can be
        mitigated through an extra level of method call but this is 
        inefficient.

Proposed Syntax

    Special methods should declare themselves with declarations of the
    following form:

    class x:
        def __attr_XXX__(self, op, val ):
            if op=="get":
                return someComputedValue(self.internal)
            elif op=="set":
                self.internal=someComputedValue(val)
            elif op=="del":
                del self.internal

    Client code looks like this:

    fooval=x.foo
    x.foo=fooval+5
    del x.foo

Semantics

     Attribute references of all three kinds should call the method.
     The op parameter can be "get"/"set"/"del". Of course this string
     will be interned so the actual checks for the string will be
     very fast.
 
     It is disallowed to actually have an attribute named XXX in the
     same instance as a method named __attr_XXX__.

     An implementation of __attr_XXX__ takes precedence over an
     implementation of __getattr__ based on the principle that
     __getattr__ is supposed to be invoked only after finding an
     appropriate attribute has failed.

     An implementation of __attr_XXX__ takes precedence over an
     implementation of __setattr__ in order to be consistent. The
     opposite choice seems fairly feasible also, however. The same
     goes for __del_y__.

Proposed Implementation

    There is a new object type called an attribute access handler. 
    Objects of this type have the following attributes:

       name (e.g. XXX, not __attr__XXX__
       method (pointer to a method object
   
    In PyClass_New, methods of the appropriate form will be detected and
    converted into objects (just like unbound method objects). These are
    stored in the class __dict__ under the name XXX. The original method
    is stored as an unbound method under its original name.

    If there are any attribute access handlers in an instance at all,
    a flag is set. Let's call it "I_have_computed_attributes" for
    now. Derived classes inherit the flag from base classes. Instances
    inherit the flag from classes.
 
    A get proceeds as usual until just before the object is returned.
    In addition to the current check whether the returned object is a
    method it would also check whether a returned object is an access
    handler. If so, it would invoke the getter method and return
    the value. To remove an attribute access handler you could directly
    fiddle with the dictionary.
 
    A set proceeds by checking the "I_have_computed_attributes" flag. If
    it is not set, everything proceeds as it does today. If it is set
    then we must do a dictionary get on the requested object name. If it
    returns an attribute access handler then we call the setter function
    with the value. If it returns any other object then we discard the
    result and continue as we do today. Note that having an attribute
    access handler will mildly affect attribute "setting" performance for
    all sets on a particular instance, but no more so than today, using
    __setattr__. Gets are more efficient than they are today with
    __getattr__.
 
    The I_have_computed_attributes flag is intended to eliminate the
    performance degradation of an extra "get" per "set" for objects not
    using this feature. Checking this flag should have miniscule
    performance implications for all objects.

    The implementation of delete is analogous to the implementation
    of set.

Caveats

    1. You might note that I have not proposed any logic to keep
       the I_have_computed_attributes flag up to date as attributes
       are added and removed from the instance's dictionary. This is
       consistent with current Python. If you add a __setattr__ method
       to an object after it is in use, that method will not behave as
       it would if it were available at "compile" time. The dynamism is
       arguably not worth the extra implementation effort. This snippet
       demonstrates the current behavior:

	>>> def prn(*args):print args
	>>> class a:
	...    __setattr__=prn
        >>> a().foo=5
	(<__main__.a instance at 882890>, 'foo', 5)

	>>> class b: pass 
	>>> bi=b()
	>>> bi.__setattr__=prn
	>>> b.foo=5

     2. Assignment to __dict__["XXX"] can overwrite the attribute
	access handler for __attr_XXX__. Typically the access handlers will
        store information away in private __XXX variables

     3. An attribute access handler that attempts to call setattr or getattr
        on the object itself can cause an infinite loop (as with __getattr__)
        Once again, the solution is to use a special (typically private) 
        variable such as __XXX.

Note

    The descriptor mechanism described in PEP 252 is powerful enough
    to support this more directly.  A 'getset' constructor may be
    added to the language making this possible:

      class C:
          def get_x(self):
              return self.__x
          def set_x(self, v):
              self.__x = v
          x = getset(get_x, set_x)

    Additional syntactic sugar might be added, or a naming convention
    could be recognized.