This module defines base classes for standard Python codecs (encoders and decoders) and provides access to the internal Python codec registry which manages the codec and error handling lookup process.
It defines the following functions:
search_function) |
(encoder, decoder, stream_reader,
stream_writer)
taking the following arguments:
encoder and decoder: These must be functions or methods which have the same interface as the encode()/decode() methods of Codec instances (see Codec Interface). The functions/methods are expected to work in a stateless mode.
stream_reader and stream_writer: These have to be factory functions providing the following interface:
factory(stream, errors='strict')
The factory functions must return objects providing the interfaces defined by the base classes StreamWriter and StreamReader, respectively. Stream codecs can maintain state.
Possible values for errors are 'strict'
(raise an exception
in case of an encoding error), 'replace'
(replace malformed
data with a suitable replacement marker, such as "?"),
'ignore'
(ignore malformed data and continue without further
notice), 'xmlcharrefreplace'
(replace with the appropriate XML
character reference (for encoding only)) and 'backslashreplace'
(replace with backslashed escape sequences (for encoding only)) as
well as any other error handling name defined via
register_error().
In case a search function cannot find a given encoding, it should
return None
.
encoding) |
Encodings are first looked up in the registry's cache. If not found, the list of registered search functions is scanned. If no codecs tuple is found, a LookupError is raised. Otherwise, the codecs tuple is stored in the cache and returned to the caller.
To simplify access to the various codecs, the module provides these additional functions which use lookup() for the codec lookup:
encoding) |
Raises a LookupError in case the encoding cannot be found.
encoding) |
Raises a LookupError in case the encoding cannot be found.
encoding) |
Raises a LookupError in case the encoding cannot be found.
encoding) |
Raises a LookupError in case the encoding cannot be found.
name, error_handler) |
For encoding error_handler will be called with a UnicodeEncodeError instance, which contains information about the location of the error. The error handler must either raise this or a different exception or return a tuple with a replacement for the unencodable part of the input and a position where encoding should continue. The encoder will encode the replacement and continue encoding the original input at the specified position. Negative position values will be treated as being relative to the end of the input string. If the resulting position is out of bound an IndexError will be raised.
Decoding and translating works similar, except UnicodeDecodeError or UnicodeTranslateError will be passed to the handler and that the replacement from the error handler will be put into the output directly.
name) |
Raises a LookupError in case the handler cannot be found.
exception) |
strict
error handling.
exception) |
replace
error handling.
exception) |
ignore
error handling.
exception) |
xmlcharrefreplace
error handling.
exception) |
backslashreplace
error handling.
To simplify working with encoded files or stream, the module also defines these utility functions:
filename, mode[, encoding[, errors[, buffering]]]) |
Note: The wrapped version will only accept the object format defined by the codecs, i.e. Unicode objects for most built-in codecs. Output is also codec-dependent and will usually be Unicode as well.
encoding specifies the encoding which is to be used for the file.
errors may be given to define the error handling. It defaults
to 'strict'
which causes a ValueError to be raised
in case an encoding error occurs.
buffering has the same meaning as for the built-in open() function. It defaults to line buffered.
file, input[, output[, errors]]) |
Strings written to the wrapped file are interpreted according to the given input encoding and then written to the original file as strings using the output encoding. The intermediate encoding will usually be Unicode but depends on the specified codecs.
If output is not given, it defaults to input.
errors may be given to define the error handling. It defaults to
'strict'
, which causes ValueError to be raised in case
an encoding error occurs.
The module also provides the following constants which are useful for reading and writing to platform dependent files:
See About this document... for information on suggesting changes.