PEP 285 -- Adding a bool type

PEP:	285
Title:	Adding a bool type
Version:	$Revision: 1227 $
Last-Modified:	$Date: 2002-04-11 08:34:19 -0700 (Thu, 11 Apr 2002) $
Author:	Guido van Rossum <guido at python.org>
Status:	Final
Type:	Standards Track
Created:	8-Mar-2002
Python-Version:	2.3
Post-History:	8-Mar-2002, 30-Mar-2002, 3-Apr-2002

Abstract

    This PEP proposes the introduction of a new built-in type, bool,
    with two constants, False and True.  The bool type would be a
    straightforward subtype (in C) of the int type, and the values
    False and True would behave like 0 and 1 in most respects (for
    example, False==0 and True==1 would be true) except repr() and
    str().  All built-in operations that conceptually return a Boolean
    result will be changed to return False or True instead of 0 or 1;
    for example, comparisons, the "not" operator, and predicates like
    isinstance().

Review

    I've collected enough feedback to last me a lifetime, so I declare
    the review period officially OVER.  I had Chinese food today; my
    fortune cookie said "Strong and bitter words indicate a weak
    cause."  It reminded me of some of the posts against this
    PEP... :-)

    Anyway, here are my BDFL pronouncements.  (Executive summary: I'm
    not changing a thing; all variants are rejected.)

    1) Should this PEP be accepted?

    => Yes.

       There have been many arguments against the PEP.  Many of them
       were based on misunderstandings.  I've tried to clarify some of
       the most common misunderstandings below in the main text of the
       PEP.  The only issue that weighs at all for me is the tendency
       of newbies to write "if x == True" where "if x" would suffice.
       More about that below too.  I think this is not a sufficient
       reason to reject the PEP.

    2) Should str(True) return "True" or "1"?  "1" might reduce
       backwards compatibility problems, but looks strange.
       (repr(True) would always return "True".)

    => "True".

       Almost all reviewers agree with this.

    3) Should the constants be called 'True' and 'False' (similar to
       None) or 'true' and 'false' (as in C++, Java and C99)?

    => True and False.

       Most reviewers agree that consistency within Python is more
       important than consistency with other languages.

    4) Should we strive to eliminate non-Boolean operations on bools
       in the future, through suitable warnings, so that for example
       True+1 would eventually (in Python 3000) be illegal?

    => No.

       There's a small but vocal minority that would prefer to see
       "textbook" bools that don't support arithmetic operations at
       all, but most reviewers agree with me that bools should always
       allow arithmetic operations.

    5) Should operator.truth(x) return an int or a bool?

    => bool.

       Tim Peters believes it should return an int, but almost all
       other reviewers agree that it should return a bool.  My
       rationale: operator.truth() exists to force a Boolean context
       on its argument (it calls the C API PyObject_IsTrue()).
       Whether the outcome is reported as int or bool is secondary; if
       bool exists there's no reason not to use it.  (Under the PEP,
       operator.truth() now becomes an alias for bool(); that's fine.)

    6) Should bool inherit from int?

    => Yes.

       In an ideal world, bool might be better implemented as a
       separate integer type that knows how to perform mixed-mode
       arithmetic.  However, inheriting bool from int eases the
       implementation enormously (in part since all C code that calls
       PyInt_Check() will continue to work -- this returns true for
       subclasses of int).  Also, I believe this is right in terms of
       substitutability: code that requires an int can be fed a bool
       and it will behave the same as 0 or 1.  Code that requires a
       bool may not work when it is given an int; for example, 3 & 4
       is 0, but both 3 and 4 are true when considered as truth
       values.

    7) Should the name 'bool' be changed?

    => No.

       Some reviewers have argued for boolean instead of bool, because
       this would be easier to understand (novices may have heard of
       Boolean algebra but may not make the connection with bool) or
       because they hate abbreviations.  My take: Python uses
       abbreviations judiciously (like 'def', 'int', 'dict') and I
       don't think these are a burden to understanding.  To a newbie,
       it doesn't matter whether it's called a waffle or a bool; it's
       a new word, and they learn quickly what it means.

       One reviewer has argued to make the name 'truth'.  I find this
       an unattractive name, and would actually prefer to reserve this
       term (in documentation) for the more abstract concept of truth
       values that already exists in Python.  For example: "when a
       container is interpreted as a truth value, an empty container
       is considered false and a non-empty one is considered true."

    8) Should we strive to require that Boolean operations (like "if",
       "and", "not") have a bool as an argument in the future, so that
       for example "if []:" would become illegal and would have to be
       writen as "if bool([]):" ???

    => No!!!

       Some people believe that this is how a language with a textbook
       Boolean type should behave.  Because it was brought up, others
       have worried that I might agree with this position.  Let me
       make my position on this quite clear.  This is not part of the
       PEP's motivation and I don't intend to make this change.  (See
       also the section "Clarification" below.)

Rationale

    Most languages eventually grow a Boolean type; even C99 (the new
    and improved C standard, not yet widely adopted) has one.

    Many programmers apparently feel the need for a Boolean type; most
    Python documentation contains a bit of an apology for the absence
    of a Boolean type.  I've seen lots of modules that defined
    constants "False=0" and "True=1" (or similar) at the top and used
    those.  The problem with this is that everybody does it
    differently.  For example, should you use "FALSE", "false",
    "False", "F" or even "f"?  And should false be the value zero or
    None, or perhaps a truth value of a different type that will print
    as "true" or "false"?  Adding a standard bool type to the language
    resolves those issues.

    Some external libraries (like databases and RPC packages) need to
    be able to distinguish between Boolean and integral values, and
    while it's usually possible to craft a solution, it would be
    easier if the language offered a standard Boolean type.  This also
    applies to Jython: some Java classes have separately overloaded
    methods or constructors for int and boolean arguments.  The bool
    type can be used to select the boolean variant.  (The same is
    apparently the case for some COM interfaces.)

    The standard bool type can also serve as a way to force a value to
    be interpreted as a Boolean, which can be used to normalize
    Boolean values.  When a Boolean value needs to be normalized to
    one of two values, bool(x) is much clearer than "not not x" and
    much more concise than

        if x:
            return 1
        else:
            return 0

    Here are some arguments derived from teaching Python.  When
    showing people comparison operators etc. in the interactive shell,
    I think this is a bit ugly:

        >>> a = 13
        >>> b = 12
        >>> a > b
        1
        >>>

    If this was:

        >>> a > b
        True
        >>>

    it would require a millisecond less thinking each time a 0 or 1
    was printed.

    There's also the issue (which I've seen baffling even experienced
    Pythonistas who had been away from the language for a while) that
    if you see:

        >>> cmp(a, b)
        1
        >>> cmp(a, a)
        0
        >>> 

    you might be tempted to believe that cmp() also returned a truth
    value, whereas in reality it can return three different values
    (-1, 0, 1).  If ints were not (normally) used to represent
    Booleans results, this would stand out much more clearly as
    something completely different.

Specification

    The following Python code specifies most of the properties of the
    new type:

        class bool(int):

            def __new__(cls, val=0):
                # This constructor always returns an existing instance
                if val:
                    return True
                else:
                    return False

            def __repr__(self):
                if self:
                    return "True"
                else:
                    return "False"

            __str__ = __repr__

            def __and__(self, other):
                if isinstance(other, bool):
                    return bool(int(self) & int(other))
                else:
                    return int.__and__(self, other)

            __rand__ = __and__

            def __or__(self, other):
                if isinstance(other, bool):
                    return bool(int(self) | int(other))
                else:
                    return int.__or__(self, other)

            __ror__ = __or__

            def __xor__(self, other):
                if isinstance(other, bool):
                    return bool(int(self) ^ int(other))
                else:
                    return int.__xor__(self, other)

            __rxor__ = __xor__

        # Bootstrap truth values through sheer willpower
        False = int.__new__(bool, 0)
        True = int.__new__(bool, 1)

    The values False and True will be singletons, like None.  Because
    the type has two values, perhaps these should be called
    "doubletons"?  The real implementation will not allow other
    instances of bool to be created.

    True and False will properly round-trip through pickling and
    marshalling; for example pickle.loads(pickle.dumps(True)) will
    return True, and so will marshal.loads(marshal.dumps(True)).

    All built-in operations that are defined to return a Boolean
    result will be changed to return False or True instead of 0 or 1.
    In particular, this affects comparisons (<, <=, ==, !=, >, >=, is,
    is not, in, not in), the unary operator 'not', the built-in
    functions callable(), hasattr(), isinstance() and issubclass(),
    the dict method has_key(), the string and unicode methods
    endswith(), isalnum(), isalpha(), isdigit(), islower(), isspace(),
    istitle(), isupper(), and startswith(), the unicode methods
    isdecimal() and isnumeric(), and the 'closed' attribute of file
    objects.  The predicates in the operator module are also changed
    to return a bool, including operator.truth().

    Because bool inherits from int, True+1 is valid and equals 2, and
    so on.  This is important for backwards compatibility: because
    comparisons and so on currently return integer values, there's no
    way of telling what uses existing applications make of these
    values.

    It is expected that over time, the standard library will be
    updated to use False and True when appropriate (but not to require
    a bool argument type where previous an int was allowed).  This
    change should not pose additional problems and is not specified in
    detail by this PEP.

C API

    The header file "boolobject.h" defines the C API for the bool
    type.  It is included by "Python.h" so there is no need to include
    it directly.

    The existing names Py_False and Py_True reference the unique bool
    objects False and True (previously these referenced static int
    objects with values 0 and 1, which were not unique amongst int
    values).

    A new API, PyObject *PyBool_FromLong(long), takes a C long int
    argument and returns a new reference to either Py_False (when the
    argument is zero) or Py_True (when it is nonzero).

    To check whether an object is a bool, the macro PyBool_Check() can
    be used.

    The type of bool instances is PyBoolObject *.

    The bool type object is available as PyBool_Type.

Clarification

    This PEP does *not* change the fact that almost all object types
    can be used as truth values.  For example, when used in an if
    statement, an empty list is false and a non-empty one is true;
    this does not change and there is no plan to ever change this.

    The only thing that changes is the preferred values to represent
    truth values when returned or assigned explicitly.  Previously,
    these preferred truth values were 0 and 1; the PEP changes the
    preferred values to False and True, and changes built-in
    operations to return these preferred values.

Compatibility

    Because of backwards compatibility, the bool type lacks many
    properties that some would like to see.  For example, arithmetic
    operations with one or two bool arguments is allowed, treating
    False as 0 and True as 1.  Also, a bool may be used as a sequence
    index.

    I don't see this as a problem, and I don't want evolve the
    language in this direction either.  I don't believe that a
    stricter interpretation of "Booleanness" makes the language any
    clearer.

    Another consequence of the compatibility requirement is that the
    expression "True and 6" has the value 6, and similarly the
    expression "False or None" has the value None.  The "and" and "or"
    operators are usefully defined to return the first argument that
    determines the outcome, and this won't change; in particular, they
    don't force the outcome to be a bool.  Of course, if both
    arguments are bools, the outcome is always a bool.  It can also
    easily be coerced into being a bool by writing for example "bool(x
    and y)".

Resolved Issues

    (See also the Review section above.)

    - Because the repr() or str() of a bool value is different from an
      int value, some code (for example doctest-based unit tests, and
      possibly database code that relies on things like "%s" % truth)
      may fail.  It is easy to work around this (without explicitly
      referencing the bool type), and it is expected that this only
      affects a very small amount of code that can easily be fixed.

    - Other languages (C99, C++, Java) name the constants "false" and
      "true", in all lowercase.  For Python, I prefer to stick with
      the example set by the existing built-in constants, which all
      use CapitalizedWords: None, Ellipsis, NotImplemented (as well as
      all built-in exceptions).  Python's built-in namespace uses all
      lowercase for functions and types only.

    - It has been suggested that, in order to satisfy user
      expectations, for every x that is considered true in a Boolean
      context, the expression x == True should be true, and likewise
      if x is considered false, x == False should be true.  In
      particular newbies who have only just learned about Boolean
      variables are likely to write

          if x == True: ...

      instead of the correct form,

          if x: ...

      There seem to be strong psychological and linguistic reasons why
      many people are at first uncomfortable with the latter form, but
      I believe that the solution should be in education rather than
      in crippling the language.  After all, == is general seen as a
      transitive operator, meaning that from a==b and b==c we can
      deduce a==c.  But if any comparison to True were to report
      equality when the other operand was a true value of any type,
      atrocities like 6==True==7 would hold true, from which one could
      infer the falsehood 6==7.  That's unacceptable.  (In addition,
      it would break backwards compatibility.  But even if it didn't,
      I'd still be against this, for the stated reasons.)

      Newbies should also be reminded that there's never a reason to
      write

          if bool(x): ...

      since the bool is implicit in the "if".  Explicit is *not*
      better than implicit here, since the added verbiage impairs
      redability and there's no other interpretation possible.  There
      is, however, sometimes a reason to write

          b = bool(x)

      This is useful when it is unattractive to keep a reference to an
      arbitrary object x, or when normalization is required for some
      other reason.  It is also sometimes appropriate to write

          i = int(bool(x))

      which converts the bool to an int with the value 0 or 1.  This
      conveys the intention to henceforth use the value as an int.

Implementation

    A complete implementation in C has been uploaded to the
    SourceForge patch manager:

        http://python.org/sf/528022

    This will soon be checked into CVS for python 2.3a0.

Copyright

    This document has been placed in the public domain.