The codecs defines a set of base classes which define the interface and can also be used to easily write you own codecs for use in Python.
Each codec has to define four interfaces to make it usable as codec in Python: stateless encoder, stateless decoder, stream reader and stream writer. The stream reader and writers typically reuse the stateless encoder/decoder to implement the file protocols.
The Codec class defines the interface for stateless encoders/decoders.
To simplify and standardize error handling, the encode() and decode() methods may implement different error handling schemes by providing the errors string argument. The following string values are defined and implemented by all standard Python codecs:
Value | Meaning |
---|---|
'strict' |
Raise UnicodeError (or a subclass); this is the default. |
'ignore' |
Ignore the character and continue with the next. |
'replace' |
Replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the built-in Unicode codecs on decoding and '?' on encoding. |
'xmlcharrefreplace' |
Replace with the appropriate XML character reference (only for encoding). |
'backslashreplace' |
Replace with backslashed escape sequences (only for encoding). |
The set of allowed values can be extended via register_error.
See About this document... for information on suggesting changes.