So far this has never been a priority for Python, which shows in some
places, but I wonder how difficult it would be to add this capability.
The following is basically me thinking aloud, so excuse me if I don't
make sense...
Let me first state what the goal is here. Suppose I want to run a new
kind of information server, where people can submit queries in the
form of small scripts. Or I may want to run a MUD (Multi-User
Dungeon) type of game where players can create their own objects
(usable by others) by writing a small program or function that
implements the object. Obviously, a language combining power and
elegance, like Python, would be a good candidate for the language of
such scripts, but I don't want them to remove my files. Therefore my
server (which may also be written in Python) needs to limit the damage
that user-supplied code can do.
Some people would reply "the only safe way is to run it in a chroot'ed
environment". That may be true, but chroot itself is restricted to
root (and rightly so!) so only root can use this solution.
An alternative, whicn may not be perfect, but at least available
without prejudice, is to use the fact that you're using an
interpreter, where all system calls are made through a wrapper. This
gives you a lot more control over what you allow and what not. "Safe"
versions exist of Tcl, Scheme and Perl (in a sense -- I don't know if
taintperl uses the same approach). Let's consider what would be
necessary to create a safe version of Python. (I haven't read any of
the docs on Safe-Tcl, so maybe I'm missing some good ideas -- if so,
please tell me!)
There are a number of possible attacks to systems, and these can be
grouped in a number of categories. The first category is denial of
service -- e.g. make the system crash or hang.
- One way to do this is exhaustion of resources. This could be a simple
recursive loop, either in Python or in the interpreter (e.g. trying to
get the repr() of a list containing a pointer to itself). Or it could
be a while loop allocating all available memory -- most systems with
VM will make your machine thrash long before malloc() actually returns
NULL...
This kind of attacks can be made more difficult by putting checks on
resource uses in the code. The damage that can be done is limited --
at most data in the current process can be lost.
- Another possible attack that results in denial of service is abuse
from known bugs in the code. Any system of this size has bugs -- if a
hacker finds an unchecked strcpy() they can easily clobber the stack
(and as we know after the Internet worm, it is even possible to
clobber it with code that take over control, rahter than causing an
uncontrolled crash).
The only possible protection against attacks using bugs in the code is
coding carefully.
A second category of possible attacks can permanently damage the
attacked system -- by deleting or modifying files or directories.
- This is usually done by abusing functions that can create files or run
arbitrary system commands.
Currently, Python has no protection against this kind of attack
whatsoever -- the built-in function open() is only one of many ways to
overwrite files.
I'd like to design a general architecture for restricting the damage
that built-in functions can do.
The first thing to do is making an inventory of potentially dangerous
built-in functions. (Functions written in Python can only be
dangerous by using built-in functions, so we only need to consider
built-in functions.) Of course, for our purpose, methods of built-in
objects, are to be considered too. It may even be necessary to check
built-in operations like a[i] or x+y.
The next thing is to decide what to do. Since we aren't (yet) after a
system that prevents smuggling information to the outside world,
open(file, 'r') is safe, but open(file, 'w') is not(*). Even a limited
number of fork() calls may be alright. This means that we can't
simply remove the unsafe routines from the system tables (though this
wuold surely be the simples solution), but we may have to add security
checks to some routines.
(*) Actually, open(file, 'r') is probably only OK if the file has
world read permission and is not called /etc/passwd.
One approach would be to have a global variable, which can be set or
cleard by the "host" code only, that affects whether dangerous
functions are restricted or not. If a dangerous function is called,
it checks the flag, and if it's set, it does additional security
checking.
A large improvement over this might be to let the host code provide a
"guard" function implemented in Python -- when a potentially dangerous
built-in function is about to be called (this can be checked by
call_builtin) the guard is called with the function name and the
argument list. The guard can then check whatever it wants and return
true or false (or perhaps raise an exception instead of returning
false). It may even be possible to to let the guard return an
alternative outcome to be substituted. Such a guard could also be
applied for some functionality that's not implemented as functions,
like 'import'.
A concrete proposal for an interface would be to add a flag to the
"methodlist" table of each module to indicate which functions are
unsafe (let the default be safe -- by far the majority of functions
are probably safe), and to add a built-in function
guarded_exec(string, globals, locals, guard)
which starts restricted execution. (Of course this function is
is dangerous!)
Note that I don't address the first category of attacks (resource
over-use and exploiting bugs). I've just doen a grep for all
occurrences of scanf, str(n)cpy, str(n)cat and found only two unsafe
places that can easily be exploited from Python -- both in import.c,
and one in AIX-specific code (it had the wrong idea of what the length
passed to strncpy does). (The other wan was pointed out to me just
last week.) It's also easy enough to add a check on nesting depth for
a few well-known recursive places, but in inventory of unsafe places
is in order.
My questions right now are:
- Do you have a need for this -- would you use it?
- Do you think it can be made safe enough?
- Are there holes in my approach?
- Do you like my approach?
- Can you think of a better name for guarded_exec()? (I don't like
rexec since it sounds too much like remote exec.)
- Do you know of any potentially unsafe situations (i.e. bugs :-)in
the current Python code, like the recursion I mentioned in repr() and
print?
- Would you like to help implementing this? (In that case you'll need
to provide a PGP key :-)
- Anyone know enough about the approaches other languages are taking
to summarize how they do it?
--Guido van Rossum, CWI, Amsterdam <mailto:Guido.van.Rossum@cwi.nl>
<http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>