PyChecker: Finding Bugs in Other People's Programs
PyChecker:
Finding Bugs in Other People's Programs
MetaSlash, Inc.
Abstract
PyChecker is a software development tool used to find programming
errors in Python code. Like a compiler, it performs the static analysis
used by less dynamic languages like C/C++ and Java. Errors are
found by examining the structural elements of the code, such as
modules, functions, classes, and executable statements. This paper
describes the motivations, design, and benefits of PyChecker.
Keywords: code inspection, static analysis, debugging tool
Introduction
PyChecker [1] began as an experiment to determine the
feasibility of using static analysis on Python [2]
programs. Python is a highly dynamic language that does not lend
itself to the static analysis typically performed by compilers. Some
people in the computer science community believe the development of
robust and/or complex software requires compile-time checks for
software correctness. Although compilers detect many bugs, they do
not eliminate all bugs. Furthermore, a common misconception is that
scripting languages like Python are less robust than compiled
languages and should not be used for mission critical systems.
In the beginning, PyChecker only found missing doc strings. This
technique provided the learning necessary to make more advanced
checks. Missing and unused global checks were later added. Soon, we
had built up enough internal information about methods and functions
to check for the proper number of arguments in function calls.
We have worked on mission critical, high availability, air traffic
control software and Python is a great language for many of these
systems. However, development with Python can be improved to help
software developers reduce bugs. PyChecker was created in an effort
to find bugs during development prior to testing or delivering our
systems.
The initial release of PyChecker demonstrated it could find bugs by
static analysis. The positive feedback from many developers
proved the value of PyChecker. With added incentive, we implemented
more features to find other types of bugs.
Background
Motivation for Developing PyChecker
We find ourselves using Python in a broad range of systems that
are being extended and maintained for many years. Python is an
excellent language for these applications, because it helps avoid
such runtime conditions as out-of-bounds errors and memory leaks.
Developing large, robust systems requires significant testing; all
paths through the code must be executed to achieve full test coverage.
PyChecker supports the development of these systems by automating code
inspection and checking for errors in ways that testing does not.
Generally, exceptional conditions and error handling code are tested
the least because they are rarely, if ever, executed. Therefore,
undetected errors are more likely and handling an unexpected event can
become a catastrophic error.
When making changes to complex systems, modifications must be done
carefully, backed by large investments in testing. However, it's not
always possible to test all aspects of a system. Less dynamic
languages like C/C++ and Java check for errors during the static
analysis phase of compilation. These languages require type
declarations, for example, to find coding errors related to mistyped
variables or parameters. Other types of bugs can be found in this
first pass of code inspection, like calling a function with the wrong
number of parameters or forgetting to import a module. However,
Python finds these bugs later, during execution of the program,
because it doesn't perform static analysis.
Static Analysis
Static analysis is not necessary for a language to be useful or even
powerful. We love Python -- it is clean, portable, and easy-to-use.
However, it is also easy to write code in Python and make an
undetected mistake. It is even easier to refactor working code and
make a mistake. We encountered many instances where common problems
could be found through code inspection. PyChecker was developed to
do the inspection automatically, so we could develop more robust
systems.
Static analysis has been around a long time.
lint [3] has been
used for decades on C code. gcc [4] has
many options to configure warnings, including -Wall to turn
on all common warnings. perl [5] has the
use strict; and use warnings; directives
as well as -w.
More recently the Stanford Validity Checker [6]
has been used to find hundreds of
semantic bugs in the Linux kernel. However, Python had no equivalent
to these tools. There is a tool to freeze code for a specific
architecture so that the Python interpreter does not need to be
installed. This process will not find invalid references, such as
a call to a function that does not exist.
While static analysis has been considered for Python [7],
this work has not yet been used to develop PyChecker.
What PyChecker Does
Currently PyChecker produces four different categories of warnings:
- Likely Bugs
- Potential Bugs
- Unused Identifiers
- Code Complexity/Style
Likely Bug Warnings
- No global found (e.g., using a module without importing it)
- Passing the wrong number of parameters to functions/methods/constructors
- Using format strings that do not match parameters
- Using class methods and attributes that do not exist
- Using a variable before setting it
- self is not the first parameter to a method
- self is a parameter to a function
- Returning a value from __init__()
- Instantiating an object with arguments, but no constructor
Potential Bug Warnings
- Changing the signature of a method when overriding
- Redefining a function/class/method in the same scope
- Using integer division
Unused Identifier Warnings
- Unused globals and locals (module or variable identifiers)
- Unused function/method parameters
Code Complexity/Style Warnings
- Functions/methods with too many lines, returns, branches,
parameters, or local variables
- Importing a module multiple times or using import and from/import
- No doc strings in modules, classes, functions, or methods
Potential security warnings will be implemented in the near future.
Many other enhancements are also planned, for example: more type checking,
more checking for code that can throw exceptions, path analysis, and
unreachable code.
To get a better idea of the types of errors PyChecker can detect,
examine the following code example for errors:
(Hint: there are 14!)
1 #! /usr/bin/env python
2
3 'Example errors caught by PyChecker'
4
5 import string
6
7 metaslash = 1
8
9 def printNames():
10 neal = 'neal'
11 michelle = 'michele'
12 eric = 5
13 print "Local values: %(neal)S %(michele)s %(eric)" % locals()
14
15 class Nothing:
16 def printValue(value):
17 print value
18 def set(self, value):
19 self.value = value
20
21 def tryToDoSomething(self, value):
22 try:
23 import string
24 if not value:
25 raise RuntimeError, "Hey, there's no value"
26 printNames('a, b, c')
27 except:
28 traceback.print_exc()
29
30 def setGlobal(value=None):
31 print 'Old MetaSlash value is:', metaslash
32 metaslash = value
33 useless = Nothing(5)
34 print 'a useless value is:', useless.valeu
With default options, PyChecker produces:
example.py:5: Imported module (string) not used
example.py:11: Local variable (michelle) not used
example.py:13: Invalid format string, problem starts near: '(eric)'
example.py:13: Invalid format string, problem starts near: '(neal)S '
example.py:13: No local variable (michele)
example.py:16: self is not first method argument
example.py:21: self is argument in function
example.py:23: Local variable (string) not used
example.py:23: Module (string) re-imported
example.py:26: Invalid arguments to (printNames), got 1, expected 0
example.py:28: No global (traceback) found
example.py:31: Variable (metaslash) used before being set
example.py:33: Instantiating an object with arguments, but no constructor
example.py:34: Object (useless) has no attribute (valeu)
Because programmers have different coding styles, PyChecker needs to be
adaptable. Warnings can be enabled or disabled so that only the
desired warnings are produced. For example,
the "unused variable" warning can be disabled since
it is not necessarily an error.
There are four different ways to customize PyChecker's configuration:
- Use command line parameters to change the default behavior
- Define configuration values in a .pycheckrc file
- Set the variable __pychecker__ in code
- Define a suppression dictionary (in .pycheckrc file) where the keys are:
'module', 'module.function', 'module.class', or 'module.class.method' and
the values use the same format as the __pychecker__ variable.
By providing a flexible way to suppress warnings, spurious warnings
can be minimized. The code complexity and style warnings can be
adjusted to match each user's preference. In addition, a "blacklist" can
be specified as a list of module names for which warnings should not
be produced. Generally, libraries are blacklisted because their code
doesn't change, therefore, neither do their warnings.
How PyChecker Works
PyChecker imports all the modules passed on the command line. As
it imports each module, PyChecker determines the attributes of the
module (imported modules, classes, and functions) using dir().
It creates
a data tree of all the modules, functions, global variables, classes,
and methods using type(). During this pass, information
about function and method signatures is gathered. After generating
the data tree from the code, PyChecker iterates
through the Python byte code. Byte codes are the equivalent of
machine instructions in the Python interpreter and are generated
by the interpreter when a module is imported.
After iterating through all the byte codes, the final set of warnings
is produced.
Byte codes are a convenient format to use for finding errors
because the data for each operation is readily available.
While iterating through the byte code, warnings are generated by:
- storing variable references to determine if they are unused
- checking the existence of class members and methods
- checking the proper argument counts to function and method calls
- checking the consistency of parameters to format strings
- checking the existence of doc strings
- various other techniques
To better understand the Python byte codes, consider the following function
contained in a file example1.py:
def increment(x): return x + 1
The following interactive session with the Python interpreter shows
the function's disassembled byte code (dis is part of the standard
library):
>>> import example1
>>> import dis
>>> dis.dis(example1.increment)
0 SET_LINENO 1
3 SET_LINENO 1
6 LOAD_FAST 0 (x)
9 LOAD_CONST 1 (1)
12 BINARY_ADD
13 RETURN_VALUE
14 LOAD_CONST 0 (None)
17 RETURN_VALUE
For each byte code instruction, a function is dispatched based upon
the op-code. For example, the op-code SET_LINENO is dispatched
to a function which
saves the current line number. Some of the dispatch functions
construct objects for later analysis. One of the primary sources of
information is the inferred status of the stack. PyChecker recovers a
significant part of program flow by analyzing the stack operations.
Many of the dispatch functions, like LOAD_FAST, build an internal
representation of what the stack will do at runtime. From this
internal representation, PyChecker can determine things like the
number of arguments passed to a function and the initial type of
local constants.
A C programmer would expect the following
code to increment the value stored in x. Although perfectly
legal in Python, this function actually has no effect on x:
def bad_increment(x): return ++x
This code produces the following the byte codes:
>>> import example2
>>> import dis
>>> dis.dis(example2.bad_increment)
0 SET_LINENO 1
3 SET_LINENO 1
6 LOAD_FAST 0 (x)
9 UNARY_POSITIVE
10 UNARY_POSITIVE
11 RETURN_VALUE
12 LOAD_CONST 0 (None)
15 RETURN_VALUE
Now it is easier to see why ++x is a statement with no
effect. ++ is not auto-increment, rather it is
+(+x). The Python interpreter does not flag this as an error
because the code is syntactically correct. The same problem exists
with --x or ~~x, in that these statements typically
have no effect.
Python byte code is easy to use and manipulate; however,
there are several drawbacks with the current techniques employed by PyChecker:
- Jython cannot be supported since it does not use CPython byte codes
- Line numbers cannot be reported if the code has been optimized
- Code block/branch/path analysis and determination of implicit
returns are more difficult since the concepts from the original
source code must be inferred from the byte code
- Code must be imported which can be problematic for some code
which requires setting up the proper environment
(e.g., setting up sys.path, importing modules in specific order, etc.)
- Comments cannot be used to supply hints, __pychecker__ must be used
Future releases of PyChecker are expected to address these issues.
Development Approach
PyChecker was released in April 2001 with the ability to produce 18
warnings. Most warnings were fairly simple and only Python 2.0 was
supported. Seven months and sixteen releases later, PyChecker is at
version 0.8.6, reports 63 warnings, and supports all versions of Python
from 1.5.2 and later.
The original warnings produced were fairly simplistic. PyChecker
also produced some inappropriate warnings because not all
language features were supported. The releases were generally quick,
often at two-week intervals. New features and configuration options
were added with each release as well as a few bug fixes.
As we learned more about Python, more
checks were implemented which sometimes resulted in bugs, but often
resulted in more ideas for additional checks to add. With a highly
iterative approach (build--test--release), the
system progressed much faster than many other projects we have worked
on. Features were added and bugs were fixed quickly and easily.
With all of the development of PyChecker, refactoring was necessary
from time to time. PyChecker has been a big help with finding bugs
in itself! Refactoring commonly entails moving blocks of code
between functions, methods, and modules, resulting in missing
imports or extra imports. Members and methods that existed in the old
class no longer exist in the new class. PyChecker would ferret
out these problems very quickly. Rather than require a test run,
it could quickly inspect the code and report problems.
PyChecker is not a substitute for testing; on the contrary, it
augments testing. During development and prior to release, PyChecker
is run on itself to ensure that it doesn't produce false warnings.
There are over 50 unit tests which are also run.
Each bug report generates a test
case so that errors are not duplicated in the future. With the
combination of running PyChecker and unit testing, we have been able
to decrease the bugs in our systems.
Summary
Many bugs can be found by static analysis tools. Such tools
complement testing and should be used by developers to create more
robust software. PyChecker is a real-world example of how beneficial
static analysis is throughout the development process.
PyChecker produced a number of surprising results:
- a dynamic language like Python benefits from static analysis
- the Python standard library had numerous undetected bugs
- implementing new warning checks are easy
- identifying new constructs to check for is very difficult
There is no shortage of bugs PyChecker can find. In the future,
PyChecker will provide better static analysis to provide warnings for
all object access, return values, security holes, unreachable code,
etc. If you have any suggestions, contact
pychecker@metaslash.com.
References
[1] |
PyChecker -
http://pychecker.sf.net
|
[2] |
Python -
http://python.org
|
[3] |
"Lint, a C Program Checker",
Computer Science Technical Report 65,
Bell Labs, Murray Hill, NJ. updated version TM 78-1273-8
man page -
http://www.opengroup.org/onlinepubs/7908799/xcu/lint.html
|
[4] |
GCC (GNU Compiler Collection) -
http://www.gnu.org/software/gcc/gcc.html
|
[5] |
Perl -
http://www.perl.org |
[6] |
"Checking System Rules Using System-Specific,
Programmer-Written Compiler Extensions"
(Stanford Validity Checker),
Engler, Dawson, et al.,
http://www.stanford.edu/~engler/mc-osdi.ps
|
[7] |
"Aggressive Type Inference", Aycock, John,
http://www.python.org/workshops/2000-01/proceedings/papers/aycock/aycock.html
|