PyCon 2004
March 25, 2004
A.M. Kuchling
www.amk.ca
amk @ amk.ca
Exim (www.exim.org) is a mail transfer agent.
A mail transfer agent, or MTA, is the program responsible for sending outgoing mail, receiving incoming messages, queueing messages when connections are down, etc. Exim is an MTA; comparable programs include Sendmail, qmail, and Postfix.
Mail user agents, or MUAs, are the programs that users run to read and send mail, such as mutt, Eudora, Entourage, Outlook, etc. MUAs usually hand off messages to an MTA for transport.
Some of Exim's noteworthy features are:
elspy (elspy.sf.net, by Greg Ward) embeds Python inside Exim for filtering.
Exim supports an API hook (local_scan()
) for filtering
messages. elspy provides a local_scan()
that initializes the Python interpreter and invokes Python code.
On receiving a message, elspy will import the exim_local_scan
module and run the local_scan()
function in the module.
The Python code is given the headers and body of the message, and various information about the remote connection. Messages can be permanently or temporarily rejected by raising an exception. Another option is to add headers to messages before passing them on to Exim's usual delivery process. (Exim's API permits modifying existing headers, but no one has bothered to wrap this for elspy.)
Installing a new local_scan()
requires recompiling Exim.
Caution: the 0.1.1 release of elspy has a bug...
On receiving a message, Exim will import the
exim_local_scan
module and run the
local_scan()
function in it.
An example:
from elspy import RejectMessage def local_scan (info, headers, fd): subject = headers.get('subject') if subject is not None: subject = subject.strip().lower() if subject.startswith('spam'): raise RejectMessage("obvious spam rejected")
RejectMessage( [message] )
-- rejects message outright
TempRejectMessage( [message] )
-- temporarily reject message
AcceptMessage( [message] )
-- message is delivered
local_scan
return normally
Contains information about the SMTP transaction and connection:
Headers are provided as a sequence-like instance.
subject = headers.get('subject') received = headers.get_all('received') headers.add('X-Spam-Ranking', 'DEFINITELY') log = open('/var/log/exim/spam', 'a') headers.write(log) # Writes header list to a file
The fd
parameter is a file descriptor positioned at
the start of the message body.
msg = os.fdopen(fd) for line in msg.readlines(): if '419' in line: spam_score += 1
On to some examples...
from email.Header import Header # Need to decode quoted-printable from elspy import AcceptMessage, RejectMessage def local_scan (info, headers, fd): subject = headers.get('subject', '') subject = unicode(Header(subject)) lsubject = subject.lower() if lsubject.startswith('adv:'): raise RejectMessage("Spam not wanted here " "(subject line includes 'ADV')")
There's actually a bug in this code, but the bug will likely only
affect spam messages. 8-bit characters are forbidden in RFC2822
e-mail headers, but some spam messages contain 8-bit text (often ones
written in Chinese or Korean). If such a message is received, the
unicode()
call will raise an unexpected exception. Exim
will then return a temporary failure, but most such spam messages
aren't tried again. MUAs in use by actual users usually get the
subject-line quoting correct.
We can look at info.recipients_list
and check if any spamtrap addresses are present.
A subtlety of the following:
info.recipients_list
to change the
recipients.
import re def match_recipients (recipient_list, local_parts, domains): """(str | [str], str | [str]) : [str] Checks whether the given recipient list contains any addresses that combine one of the local_parts with one of the domains. """if isinstance(local_parts, str): local_parts = [local_parts] if isinstance(domains, str): domains = [domains] # Construct a regex pattern: (local1|local2)@(domain1|domain2) pattern = ('^(' + '|'.join(local_parts) + ')@(' + '|'.join(domains) + ')$') pattern = re.compile(pattern) matches = [] for addr in recipient_list: if pattern.match(addr): matches.append(addr) return matchesbody of function deleted -- see the full function in the slides on my web pagedef local_scan (info, headers, fd): if match_recipients(info.recipients_list, 'spamtrap', 'example.com'): raise RejectMessage("Spam not wanted here")
elspy includes a content filter that looks for executable attachments (using regexes, not full MIME).
from elspy import AcceptMessage, RejectMessage from elspy import execontent_simple def local_scan (info, headers, fd): # Reject messages with executable attachments # -- will raise a RejectMessage execontent_simple.local_scan(info, headers, fd)
There's also support for using SpamAssassin running as a separate daemon, rejecting certain markers and marking others.
X-Spam-Status: {Yes,No}, hits=7.8
X-Spam-Level: *******
(regex-friendly)
X-Spam-Flag: YES
if it's spammy, but not enough to be
rejected.
from elspy import AcceptMessage, RejectMessage from elspy import spamassassin # Messages that score this or more will be rejected outright. spamassassin.REJECT_THRESHOLD = 12.0 def local_scan (info, headers, fd): spamassassin.local_scan(info, headers, fd)
Sender-Permitted-From is a recent proposal to make it more difficult to forge e-mails from arbitrary addresses.
The example below uses Terence Way's PySPF (http://www.wayforward.net/spf/) to perform the SPF checking.
import spf def local_scan (info, headers, fd): response, smtp_code, explanation = \ spf.check(info.sender_host_address, info.sender_address, info.sender_host_name) # Response is one of 'pass', 'deny', 'unknown', 'error' if response == 'deny': raise RejectMessage(explanation)
Short summary: elspy is simpler, but this means it's less powerful and likely less scalable at higher workloads.
This table compares elspy with the Python Milter (http://www.bmsi.com/python/milter.html).
Sendmail + milter | elspy |
---|---|
Runs under different user ID | Runs under Exim's user ID (probably mail:mail) |
Runs in separate process | Runs in the Exim process. |
Imports done once, on process startup | Imports done once per Exim process (~ once per message) |
Can modify recipients, headers and message body | More limited:
|
Can insert processing at various points in SMTP operation: after HELO, after MAIL FROM, after DATA, etc. | Processing is only done after DATA |
To download:
These slides: www.amk.ca/talks/elspy