The main lesson from Safe-Tcl seems to be: *everything* is suspect
unless proven harmless.
There were roughly three kinds of responses on the list:
1. Don't even think about it, Python's implementation can never be made
safe because it wasn't designed from the start for safety.
There's some truth in this. But neither was Tcl, and yet Safe-Tcl
exists -- and it even provides access to almost all of Tk! So don't
despair.
2. Discussion of particular loopholes.
There are quite a few of these, though most can probably be closed
(two I haven't seen mentioned: __del__ and __dict__).
One problem is: how do you ever know you've closed them all?
Answer: you can't. It's the same as with all system security -- every
release fixes known holes and opens new ones. You have to count with
occasional successful break-ins no matter how secure your system is,
so you must log everything and make back-ups of important material.
3. Alternative solutions.
Two alternatives were proposed:
- modify import
- multiple interpreter instances
The latter is indeed a solution for some other problems too (though I
have a feeling that one of the reasons why people seem to ask for it
that it has been used as a solution for many things in Tcl that can be
solved differently in Python, e.g. using fork() or new modules).
However, as Donald Beaudry pointed out, its implementation in the
current Python source is by no means trivial. (If it had been, I
would already have done it -- enough people have asked for it to make
me start thinking about what it would need...)
Modifying import is probably one of the better ideas. In fact I have
already been thinking about breaking up part of the import
implementation so it's easier to do your own things here.
In fact if we control import and the contents of __builtin__ (and plug
a few specific holes like __dict__) we won't need a guard function
since we can implement it in import instead.
Here's what I think would be enough.
guarded_exec(code, globals, locals, builtins)
(with NO defaults!) turns access to assorted interpreter internals
(like __dict__) off and executes the code in the specified environment
-- where builtins replaces __builtins__ (and this will be inherited by
all functions called).
IMPORTANT: When an import statement is executed, builtins.__import__
is called instead -- this is the guard, in a different form. If it
doesn't exist, the code cannot call import.
The __import__ guard (which is to return the object to return for the
module -- not necessarily a module object) can do three things with an
import request.
a. If the module is entirely unsafe, raise ImportError.
b. If the module is safe except for some functions, delete those
functions from the module before returning it. Note that it's best to
create a copy of the module's dictionary, so if the untrusted code
changes the module it can't crash the host code if it uses the same
module. The returned object might even be a class instance whose
attributes are the trusted functions from the module.
c. If some of the module's functions must be modified to allow only
certain carefully checked arguments (e.g. open()), do the same as for
(b) but replace those functions by host-trusted wrappers.
Note that the guard itself is executed as untrusted code, so it cannot
perform certain operations, but it can be a method carrying selected
dangerous functions with it in a way that the untrusted code can't
access it. This should give you complete flexibility.
Other points:
Some people suggested to execute all untrusted code in a subprocess.
This is always possible, and solves some problems with programs
allocating too many resources or simply crashing the interpreter, but
a disadvantage is that communication with the trusted code becomes
painful. E.g. when implementing a MUD in Python, you'd like to allow
people to design new objects and share them with other users -- surely
there has to be some shared collection of objects. So I suggest we
leave this as an option to the trusted host, but don't require it.
John Redford worries that every trusted host will define its own set
of commands that can be used from untrusted code. I see no harm in
this -- in general untrusted code is submitted by people who want to
interact with a particular hosting application -- e.g. a database, or
an active mail system, or a WWW server, or a WWW client, or a MUD.
Each of these hosts will define the operating environment in which it
executes trusted code. Safe-Tcl does exactly that: it minutely spells
out which commands are available and what they do. (I think it's name
is to generic -- given the large number of mail-related commands it
would be better to have called it Mail-Tcl or so...)
We are far from having a complete list of untrusted operations or
modules, but almost everything that interacts with internals of the
interpreter is suspect: module marshal, code, stack frame and stack
traceback objects, lots of things in sys, __dict__, assigning to
module attributes, and what else. (There's a whole different category
of operations that can allocate all resources, but these require a
totally different solution, like a quota system. I'll leave this for
another time.)
The danger of __del__ is particularly nasty: if the untrusted code can
return arbitrary objects to the host, those objects may be deleted by
the host. If a class has a __del__ attribute, garbage collection may
call this __del__ at any point during execution of host code. This
serves as an illustration of the kind of loopholes you have to
considere...
Comments, please...
--Guido van Rossum, CWI, Amsterdam <mailto:Guido.van.Rossum@cwi.nl>
<http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>