PEP: 247
Title: API for Cryptographic Hash Functions
Version: $Revision: 1395 $
Author: A.M. Kuchling <amk at amk.ca>
Status: Final
Type: Informational
Created: 23-Mar-2001
Post-History: 20-Sep-2001

Abstract

    There are several different modules available that implement
    cryptographic hashing algorithms such as MD5 or SHA.  This
    document specifies a standard API for such algorithms, to make it
    easier to switch between different implementations.


Specification

    All hashing modules should present the same interface.  Additional
    methods or variables can be added, but those described in this
    document should always be present.

    Hash function modules define one function:

    new([string])            (unkeyed hashes)
    new([key] , [string])    (keyed hashes)

        Create a new hashing object and return it.  The first form is
        for hashes that are unkeyed, such as MD5 or SHA.  For keyed
        hashes such as HMAC, 'key' is a required parameter containing
        a string giving the key to use.  In both cases, the optional
        'string' parameter, if supplied, will be immediately hashed
        into the object's starting state, as if obj.update(string) was
        called.
        
        After creating a hashing object, arbitrary strings can be fed
        into the object using its update() method, and the hash value
        can be obtained at any time by calling the object's digest()
        method.

        Arbitrary additional keyword arguments can be added to this
        function, but if they're not supplied, sensible default values
        should be used.  For example, 'rounds' and 'digest_size'
        keywords could be added for a hash function which supports a
        variable number of rounds and several different output sizes,
        and they should default to values believed to be secure.

    Hash function modules define one variable:

    digest_size

        An integer value; the size of the digest produced by the
        hashing objects created by this module, measured in bytes.
        You could also obtain this value by creating a sample object
        and accessing its 'digest_size' attribute, but it can be
        convenient to have this value available from the module.
        Hashes with a variable output size will set this variable to
        None.

    Hashing objects require a single attribute:

    digest_size

        This attribute is identical to the module-level digest_size
        variable, measuring the size of the digest produced by the
        hashing object, measured in bytes.  If the hash has a variable
        output size, this output size must be chosen when the hashing
        object is created, and this attribute must contain the
        selected size.  Therefore None is *not* a legal value for this
        attribute.
                

    Hashing objects require the following methods:

    copy()

        Return a separate copy of this hashing object.  An update to
        this copy won't affect the original object.

    digest()

        Return the hash value of this hashing object as a string
        containing 8-bit data.  The object is not altered in any way
        by this function; you can continue updating the object after
        calling this function.

    hexdigest()

        Return the hash value of this hashing object as a string
        containing hexadecimal digits.  Lowercase letters should be used 
        for the digits 'a' through 'f'.  Like the .digest() method, this
        method mustn't alter the object.
        
    update(string)

        Hash 'string' into the current state of the hashing object.
        update() can be called any number of times during a hashing
        object's lifetime.

    Hashing modules can define additional module-level functions or 
    object methods and still be compliant with this specification.
    
    Here's an example, using a module named 'MD5':

        >>> from Crypto.Hash import MD5
        >>> m = MD5.new()
        >>> m.digest_size
        16
        >>> m.update('abc')
        >>> m.digest()
        '\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'    
        >>> m.hexdigest()
        '900150983cd24fb0d6963f7d28e17f72' 
        >>> MD5.new('abc').digest()
        '\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'    


Rationale

    The digest size is measured in bytes, not bits, even though hash
    algorithm sizes are usually quoted in bits; MD5 is a 128-bit
    algorithm and not a 16-byte one, for example.  This is because, in
    the sample code I looked at, the length in bytes is often needed
    (to seek ahead or behind in a file; to compute the length of an
    output string) while the length in bits is rarely used.
    Therefore, the burden will fall on the few people actually needing
    the size in bits, who will have to multiply digest_size by 8.

    It's been suggested that the update() method would be better named
    append().  However, that method is really causing the current
    state of the hashing object to be updated, and update() is already
    used by the md5 and sha modules included with Python, so it seems
    simplest to leave the name update() alone.

    The order of the constructor's arguments for keyed hashes was a
    sticky issue.  It wasn't clear whether the key should come first
    or second.  It's a required parameter, and the usual convention is
    to place required parameters first, but that also means that the
    'string' parameter moves from the first position to the second.
    It would be possible to get confused and pass a single argument to
    a keyed hash, thinking that you're passing an initial string to an 
    unkeyed hash, but it doesn't seem worth making the interface 
    for keyed hashes more obscure to avoid this potential error.


Changes

    2001-09-17: Renamed clear() to reset(); added digest_size attribute
    to objects; added .hexdigest() method.
    2001-09-20: Removed reset() method completely.
    2001-09-28: Set digest_size to None for variable-size hashes.


Acknowledgements

    Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar
    Shtull-Trauring, and the readers of the python-crypto list for
    their comments on this PEP.


Copyright

    This document has been placed in the public domain.