PEP 0275 -- Switching on Multiple Values

PEP:	0275
Title:	Switching on Multiple Values
Version:	$Revision: 1140 $
Author:	Marc-André Lemburg <mal at lemburg.com>
Status:	Draft
Type:	Standards Track
Python-Version:	2.4
Created:	10-Nov-2001
Post-History:

Abstract

    This PEP proposes strategies to enhance Python's performance
    with respect to handling switching on a single variable having
    one of multiple possible values.

Problem

    Up to Python 2.3, the typical way of writing multi-value switches 
    has been to use long switch constructs of the following type:

    if x == 'first state':
        ...
    elif x == 'second state':
        ...
    elif x == 'third state':
        ...
    elif x == 'fourth state':
        ...
    else:
        # default handling
        ...

    This works fine for short switch constructs, since the overhead of
    repeated loading of a local (the variable x in this case) and
    comparing it to some constant is low (it has a complexity of O(n)
    on average). However, when using such a construct to write a state
    machine such as is needed for writing parsers the number of
    possible states can easily reach 10 or more cases.

    The current solution to this problem lies in using a dispatch
    table to find the case implementing method to execute depending on
    the value of the switch variable (this can be tuned to have a
    complexity of O(1) on average, e.g. by using perfect hash
    tables). This works well for state machines which require complex
    and lengthy processing in the different case methods. It does not
    perform well for ones which only process one or two instructions
    per case, e.g.

    def handle_data(self, data):
        self.stack.append(data)
 
    A nice example of this is the state machine implemented in
    pickle.py which is used to serialize Python objects. Other
    prominent cases include XML SAX parsers and Internet protocol
    handlers.

Proposed Solutions

    This PEP proposes two different but not necessarily conflicting
    solutions:

    1. Adding an optimization to the Python compiler and VM
       which detects the above if-elif-else construct and
       generates special opcodes for it which use an read-only
       dictionary for storing jump offsets.

    2. Adding new syntax to Python which mimics the C style
       switch statement.

    The first solution has the benefit of not relying on adding new
    keywords to the language, while the second looks cleaner. Both
    involve some run-time overhead to assure that the switching
    variable is immutable and hashable.

    Both solutions use a dictionary lookup to find the right
    jump location, so they both share the same problem space in
    terms of requiring that both the switch variable and the
    constants need to be compatible to the dictionary implementation
    (hashable, comparable, a==b => hash(a)==hash(b)).

Solution 1: Optimizing if-elif-else

     Implementation:

         It should be possible for the compiler to detect an
         if-elif-else construct which has the following signature:

                      if x == 'first':...
                      elif x == 'second':...
                      else:...

         i.e. the left hand side always references the same variable,
         the right hand side a hashable immutable builtin type.  The
         right hand sides need not be all of the same type, but they
         should be comparable to the type of the left hand switch
         variable.

         The compiler could then setup a read-only (perfect) hash
         table, store it in the constants and add an opcode SWITCH in
         front of the standard if-elif-else byte code stream which
         triggers the following run-time behaviour:

         At runtime, SWITCH would check x for being one of the
         well-known immutable types (strings, unicode, numbers) and
         use the hash table for finding the right opcode snippet. If
         this condition is not met, the interpreter should revert to
         the standard if-elif-else processing by simply skipping the
         SWITCH opcode and procedding with the usual if-elif-else byte
         code stream.

     Issues:

         The new optimization should not change the current Python
         semantics (by reducing the number of __cmp__ calls and adding
         __hash__ calls in if-elif-else constructs which are affected
         by the optimiztation). To assure this, switching can only
         safely be implemented either if a "from __future__" style
         flag is used, or the switching variable is one of the builtin
         immutable types: int, float, string, unicode, etc. (not
         subtypes, since it's not clear whether these are still
         immutable or not)

         To prevent post-modifications of the jump-table dictionary
         (which could be used to reach protected code), the jump-table
         will have to be a read-only type (e.g. a read-only
         dictionary).

         The optimization should only be used for if-elif-else
         constructs which have a minimum number of n cases (where n is
         a number which has yet to be defined depending on performance
         tests).

Solution 2: Adding a switch statement to Python

     New Syntax:

         switch EXPR:
             case CONSTANT:
                 SUITE  
             case CONSTANT:
                 SUITE  
             ...
             else:
                 SUITE  

         (modulo indentation variations)

         The "else" part is optional. If no else part is given and
         none of the defined cases matches, no action is taken and 
         the switch statement is ignored. This is in line with the
         current if-behaviour. A user who wants to signal this
         situation using an exception can define an else-branch
         which then implements the intended action.

         Note that the constants need not be all of the same type, but 
         they should be comparable to the type of the switch variable.

     Implementation:

         The compiler would have to compile this into byte code
         similar to this:

          def whatis(x):
              switch(x):
                  case 'one': 
                      print '1'
                  case 'two': 
                      print '2'
                  case 'three': 
                      print '3'
                  else: 
                      print "D'oh!"

         into (ommitting POP_TOP's and SET_LINENO's):

           6  LOAD_FAST         0 (x)
           9  LOAD_CONST        1 (switch-table-1)
          12  SWITCH            26 (to 38)

          14  LOAD_CONST        2 ('1')
          17  PRINT_ITEM
          18  PRINT_NEWLINE
          19  JUMP 43

          22  LOAD_CONST        3 ('2')
          25  PRINT_ITEM
          26  PRINT_NEWLINE
          27  JUMP 43

          30  LOAD_CONST        4 ('3')
          33  PRINT_ITEM
          34  PRINT_NEWLINE
          35  JUMP 43

          38  LOAD_CONST        5 ("D'oh!")
          41  PRINT_ITEM
          42  PRINT_NEWLINE

        >>43  LOAD_CONST        0 (None)
          46  RETURN_VALUE
        
        Where the 'SWITCH' opcode would jump to 14, 22, 30 or 38
        depending on 'x'.

        Thomas Wouters has written a patch which demonstrates the
        above. You can download it from [1].

    Issues:

        The switch statement should not implement fall-through
        behaviour (as does the switch statement in C). Each case
        defines a complete and independent suite; much like in a
        if-elif-else statement. This also enables using break in
        switch statments inside loops.

        If the interpreter finds that the switch variable x is
        not hashable, it should raise a TypeError at run-time
        pointing out the problem.

        There have been other proposals for the syntax which reuse
        existing keywords and avoid adding two new ones ("switch" and
        "case"). Others have argued that the keywords should use new
        terms to avoid confusion with the C keywords of the same name
        but slightly different semantics (e.g. fall-through without
        break). Some of the proposed variants:

            case EXPR:
                of CONSTANT:
                    SUITE  
                of CONSTANT:
                    SUITE  
                else:
                    SUITE  

            case EXPR:
                if CONSTANT:
                     SUITE  
                if CONSTANT:
                    SUITE  
                else:
                    SUITE  

            when EXPR:
                in CONSTANT_TUPLE:
                    SUITE  
                in CONSTANT_TUPLE:
                    SUITE  
                ...
            else:
                 SUITE  
        
        The switch statement could be extended to allow multiple
        values for one section (e.g. case 'a', 'b', 'c': ...). Another
        proposed extension would allow ranges of values (e.g. case
        10..14: ...). These should probably be post-poned, but already
        kept in mind when designing and implementing a first version.

Examples:

    The following examples all use a new syntax as proposed by
    solution 2. However, all of these examples would work with
    solution 1 as well.

         switch EXPR:                   switch x:
             case CONSTANT:                 case "first":
                 SUITE                          print x
             case CONSTANT:                 case "second":
                 SUITE                          x = x**2
             ...                                print x
             else:                          else:
                 SUITE                          print "whoops!"


         case EXPR:                     case x:
             of CONSTANT:                   of "first":
                 SUITE                          print x
             of CONSTANT:                   of "second":
                 SUITE                          print x**2
             else:                          else:
                 SUITE                          print "whoops!"


         case EXPR:                     case state:
             if CONSTANT:                   if "first":
                  SUITE                         state = "second"
             if CONSTANT:                   if "second":
                 SUITE                          state = "third"
             else:                          else:
                 SUITE                          state = "first"


         when EXPR:                     when state:
             in CONSTANT_TUPLE:             in ("first", "second"):
                 SUITE                          print state
             in CONSTANT_TUPLE:                 state = next_state(state)
                 SUITE                      in ("seventh",):
             ...                                print "done"
         else:                                  break    # out of loop!
              SUITE                     else:
                                            print "middle state"
                                            state = next_state(state)

    Here's another nice application found by Jack Jansen (switching
    on argument types):

         switch type(x).__name__:
             case 'int':
                 SUITE
             case 'string':
                 SUITE

Scope

     XXX Explain "from __future__ import switch"

Credits

    Martin von Löwis (issues with the optimization idea)
    Thomas Wouters (switch statement + byte code compiler example)
    Skip Montanaro (dispatching ideas, examples)
    Donald Beaudry (switch syntax)
    Greg Ewing (switch syntax)
    Jack Jansen (type switching examples)

References

    [1] https://sourceforge.net/tracker/index.php?func=detail&aid=481118&group_id=5470&atid=305470

Copyright

    This document has been placed in the public domain.