12.2.2 Parsing email messages

Message object structures can be created in one of two ways: they can be created from whole cloth by instantiating Message objects and stringing them together via attach() and set_payload() calls, or they can be created by parsing a flat text representation of the email message.

The email package provides a standard parser that understands most email document structures, including MIME documents. You can pass the parser a string or a file object, and the parser will return to you the root Message instance of the object structure. For simple, non-MIME messages the payload of this root object will likely be a string containing the text of the message. For MIME messages, the root object will return True from its is_multipart() method, and the subparts can be accessed via the get_payload() and walk() methods.

Note that the parser can be extended in limited ways, and of course you can implement your own parser completely from scratch. There is no magical connection between the email package's bundled parser and the Message class, so your custom parser can create message object trees any way it finds necessary.

The primary parser class is Parser which parses both the headers and the payload of the message. In the case of multipart messages, it will recursively parse the body of the container message. Two modes of parsing are supported, strict parsing, which will usually reject any non-RFC compliant message, and lax parsing, which attempts to adjust for common MIME formatting problems.

The email.Parser module also provides a second class, called HeaderParser which can be used if you're only interested in the headers of the message. HeaderParser can be much faster in these situations, since it does not attempt to parse the message body, instead setting the payload to the raw body as a string. HeaderParser has the same API as the Parser class.



Subsections
See About this document... for information on suggesting changes.