PEP: 332 Title: Byte vectors and String/Unicode Unification Version: $Revision: 1888 $ Last-Modified: $Date: 2004-08-27 06:44:37 -0700 (Fri, 27 Aug 2004) $ Author: Skip Montanaro Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2004 Python-Version: 2.5 Post-History: Abstract ======== This PEP outlines the introduction of a raw ``bytes`` sequence object and the unification of the current ``str`` and ``unicode`` objects. Rationale ========= Python's current string objects are overloaded. They serve both to hold ASCII and non-ASCII character data and to also hold sequences of raw bytes which have no reasonable interpretation as displayable character sequences. This overlap hasn't been a big problem in the past, but as Python moves closer to requiring source code to be properly encoded, the use of strings to represent raw byte sequences will be more problematic. In addition, as Python's Unicode support has improved, it's easier to consider strings as ASCII-encoded Unicode objects. Proposed Implementation ======================= The number in parentheses indicates the Python version in which the feature will be introduced. - Add a ``bytes`` builtin which is just a synonym for ``str``. (2.5) - Add a ``b"..."`` string literal which is equivalent to raw string literals, with the exception that values which conflict with the source encoding of the containing file not generate warnings. (2.5) - Warn about the use of variables named "bytes". (2.5 or 2.6) - Introduce a ``bytes`` builtin which refers to a sequence distinct from the ``str`` type. (2.6) - Make ``str`` a synonym for ``unicode``. (3.0) Bytes Object API ================ TBD. Issues ====== - Can this be accomplished before Python 3.0? - Should ``bytes`` objects be mutable or immutable? (Guido seems to like them to be mutable.) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: