Python Style Guide
Author: Guido van Rossum
This style guide has been converted to several PEPs (Python
Enhancement Proposals):
PEP 8 for the main text, PEP 257 for docstring conventions.
See the PEP
index.
XXX Intro.
A Foolish Consistency is the Hobgoblin of Little Minds
A style guide is about consistency. Consistency with this style guide
is important. Consistency within a project is more important.
Consistency within one module or function is most important.
But most importantly: know when to be inconsistent --
sometimes the style guide just doesn't apply. When in doubt, use your
best judgement. Look at other examples and decide what looks best.
And don't hesitate to ask!
Table of Contents
- Lay-out -- how to use tabs, spaces, and
newlines.
- Comments -- on proper use of comments (and
documentation strings).
- Names -- various naming conventions.
XXX Intro.
Indentation
Use the default of Emacs Python-mode: 4 spaces for one indentation
level. For really old code that you don't want to mess up, you can
continue to use 8-space tabs. Emacs Python-mode auto-detects the
prevailing indentation level used in a file and sets its indentation
parameters accordingly.
Tabs or Spaces?
Never mix tabs and spaces. The most popular way of indenting Python
is with spaces only. The second-most popular way is with tabs only.
Code indented with a mixture of tabs and spaces should be converted to
using spaces exclusively. (In Emacs, select the whole buffer and hit
ESC-x untabify.) When invoking the python command line interpreter
with the -t option, it issues warnings about code that illegally mixes
tabs and spaces. When using -tt these warnings become errors. These
options are highly recommended!
Maximum Line Length
There are still many devices around that are limited to 80 character
lines. The default wrapping on such devices looks ugly. Therefore,
please limit all lines to a maximum of 79 characters (Emacs wraps
lines that are exactly 80 characters long).
The preferred way of wrapping long lines is by using Python's
implied line continuation inside parentheses, brackets and braces. If
necessary, you can add an extra pair of parentheses around an
expression, but sometimes using a backslash looks better. Make sure
to indent the continued line appropriately. Emacs Python-mode does
this right. Some examples:
class Rectangle(Blob):
def __init__(self, width, height,
color='black', emphasis=None, highlight=0):
if width == 0 and height == 0 and \
color == 'red' and emphasis == 'strong' or \
highlight > 100:
raise ValueError, "sorry, you lose"
if width == 0 and height == 0 and (color == 'red' or
emphasis is None):
raise ValueError, "I don't think so"
Blob.__init__(self, widt, height,
color, emphasis, highlight)
Blank Lines
Separate top-level function and class definitions with two blank
lines. Method definitions inside a class are separated by a single
blank line. Extra blank lines may be used (sparingly) to separate
groups of related functions. Blank lines may be omitted between a
bunch of related one-liners (e.g. a set of dummy implementations).
When blank lines are used to separate method definitions, there is
also a blank line between the `class' line and the first method
definition.
Use blank lines in functions, sparingly, to indicate logical
sections.
Whitespace in Expressions and Statements
Pet Peeves
I hate whitespace in the following places:
(Don't bother to argue with me on any of the above -- I've grown
accustomed to this style over 15 years.)
Other Recommendations
- Always surround these binary operators with a single space on
either side: assignment (=), comparisons (==, <, >, !=,
<>, <=, >=, in, not in, is, is not), Booleans (and, or,
not).
- Use your better judgement for the insertion of spaces around
arithmetic operators. Always be consistent about whitespace on either
side of a binary operator. Some examples:
i = i+1
submitted = submitted + 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)
c = (a + b) * (a - b)
- Don't use spaces around the '=' sign when used to indicate a
keyword argument or a default parameter value. For instance:
def complex(real, imag=0.0):
return magic(r=real, i=imag)
Comments that contradict the code are worse than no comments. Always
make a priority of keeping the comments up-to-date when the code
changes!
If a comment is a phrase or sentence, its first word should be
capitalized, unless it is an identifier that begins with a lower case
letter (never alter the case of identifiers!).
If a comment is short, the period at the end is best omitted. Block
comments generally consist of one or more paragraphs built out of
complete sentences, and each sentence should end in a period.
You can use two spaces after a sentence-ending period.
As always when writing English, Strunk and White apply.
Python coders from non-English speaking countries: please write your
comments in English, unless you are 120% sure that the code will never
be read by people who don't speak your language.
Block Comments
Block comments generally apply to some (or all) code that follows
them, and are indented to the same level as that code. Each line of a
block comment starts with a # and a single space (unless it is
indented text inside the comment). Paragraphs inside a block comment
are separated by a line containing a single #. Block comments are
best surrounded by a blank line above and below them (or two lines
above and a single line below for a block comment at the start of a a
new section of function definitions).
Inline Comments
An inline comment is a comment on the same line as a statement.
Inline comments should be used sparingly. Inline comments should be
separated by at least two spaces from the statement. They should
start with a # and a single space.
Inline comments are unnecessary and in fact distracting if they state
the obvious. Don't do this:
x = x+1 # Increment x
But sometimes, this is useful:
x = x+1 # Compensate for border
Documentation Strings
All modules should normally have doc strings, and all functions and
classes exported by a module should also have doc strings. Public
methods (including the __init__ constructor) should also have doc
strings.
The doc string of a script (a stand-alone program) should be usable as
its "usage" message, printed when the script is invoked with incorrect
or missing arguments (or perhaps with a "-h" option, for "help").
Such a doc string should document the script's function and command
line syntax, environment variables, and files. Usage messages can be
fairly elaborate (several screenfuls) and should be sufficient for a
new user to use the command properly, as well as a complete quick
reference to all options and arguments for the sophisticated user.
For consistency, always use """triple double quotes""" around doc
strings.
There are two forms of doc strings: one-liners and multi-line doc
strings.
One-line Doc Strings
One-liners are for really obvious cases. They should really
fit on one line. For example:
def kos_root():
"""Return the pathname of the KOS root directory."""
global _kos_root
if _kos_root: return _kos_root
...
Notes:
- Triple quotes are used even though the string fits on one line.
This makes it easy to later expand it.
- The closing quotes are on the same line as the opening quotes.
This looks better for one-liners.
- There's no blank line either before or after the doc string.
- The doc string is a phrase ending in a period. It prescribes the
function's effect as a command ("Do this", "Return
that"), not as a description: e.g. don't write "Returns the
pathname ..."
Multi-line Doc Strings
Multi-line doc strings consist of a summary line just like a one-line
doc string, followed by a blank line, followed by a more
elaborate description. The summary line may be used by automatic
indexing tools; it is important that it fits on one line and is
separated from the rest of the doc string by a blank line.
The entire doc string is indented the same as the quotes at its first
line (see example below). Doc string processing tools will strip an
amount of indentation from the second and further lines of the doc
string equal to the indentation of the first non-blank line after the
first line of the doc string. Relative indentation of later lines in
the doc string is retained.
I recommend inserting a blank line between the last paragraph in a
multi-line doc string and its closing quotes, placing the closing
quotes on a line by themselves. This way, Emacs' fill-paragraph
command can be used on it.
I also recommend inserting a blank line before and after all doc
strings (one-line or multi-line) that document a class -- generally
speaking, the class' methods are separated from each other by a single
blank line, and the doc string needs to be offset from the first
method by a blank line; for symmetry, I prefer having a blank line
between the class header and the doc string. Doc strings documenting
function generally don't have this requirement, unless the function's
body is written as a number of blank-line separated sections -- in
this case, treat the doc string as another section, and precede it
with a blank line.
The doc string for a module should generally list the classes,
exceptions and functions (and any other objects) that are exported by
the module, with a one-line summary of each. (These summaries
generally give less detail than the summary line in the object's doc
string.)
The doc string for a function or method should summarize its behavior
and document its arguments, return value(s), side effects, exceptions
raised, and restrictions on when it can be called (all if
applicable). Optional arguments should be indicated. It should be
documented whether keyword arguments are part of the interface.
The doc string for a class should summarize its behavior and list the
public methods and instance variables. If the class is intended to be
subclassed, and has an additional interface for subclasses, this
interface should be listed separately (in the doc string). The class
constructor should be documented in the doc string for its __init__
method. Individual methods should be documented by their own doc
string.
If a class subclasses another class and its behavior is mostly
inherited from that class, its doc string should mention this and
summarize the differences. Use the verb "override" to indicate that a
subclass method replaces a superclass method and does not call the
superclass method; use the verb "extend" to indicate that a subclass
method calls the superclass method (in addition to its own
behavior).
Do not use the Emacs convention of mentioning the arguments of
functions or methods in upper case in running text. Python is case
sensitive and the argument names can be used for keyword arguments, so
the doc string should document the correct argument names. It is best
to list each argument on a separate line, with two dashes separating
the name from the description, like this:
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
Version Bookkeeping
If you have to have RCS or CVS crud in your source file, do it as
follows.
__version__ = "$Revision: 6104 $"
# $Source$
These lines should be included after the module's doc string, before
any other code, separated by a blank line above and below.
The naming conventions of Python's library are a bit of a mess, so
we'll never get this completely consistent -- nevertheless, here are
some guidelines.
Descriptive: Naming Styles
There are a lot of different naming styles. It helps to be able to
recognize what naming style is being used, independently from what
they are used for.
The following naming styles are commonly distinguished:
- x (single lowercase letter)
- X (single uppercase letter)
- lowercase
- lower_case_with_underscores
- UPPERCASE
- UPPER_CASE_WITH_UNDERSCORES
- CapitalizedWords (or CapWords)
- mixedCase (differs from CapitalizedWords by initial lowercase
character!)
- Capitalized_Words_With_Underscores (ugly!)
There's also the style of using a short unique prefix to group related
names together. This is not used much in Python, but I mention it for
completeness. For example, the os.stat() function returns a tuple
whose items traditionally have names like st_mode, st_size, st_mtime
and so on. The X11 library uses a leading X for all its public
functions. (In Python, this style is generally deemed unnecessary
because attribute and method names are prefixed with an object, and
function names are prefixed with a module name.)
In addition, the following special forms using leading or trailing
underscores are recognized (these can gerally be combined with any
case convention):
- _single_leading_underscore: weak "internal use" indicator
(e.g. "from M import *" does not import objects whose name starts with
an underscore).
- single_trailing_underscore_: used by convention to avoid conflicts
with Python keyword, e.g. Tkinter.Toplevel(master,
class_="ClassName").
- __double_leading_underscore: class-private names in Python 1.4.
- __double_leading_and_trailing_underscore__: "magic" objects or
attributes that live in user-controlled namespaces, e.g. __init__,
__import__ or __file__. Sometimes these are defined by the user
to trigger certain magic behavior (e.g. operator overloading);
sometimes these are inserted by the infrastructure for its own use or
for debugging purposes. Since the infrastructure (loosely defined as
the Python interpreter and the standard library) may decide to grow
its list of magic attributes in future versions, user code should
generally refrain from using this convention for its own use. User
code that aspires to become part of the infrastructure could combine
this with a short prefix inside the underscores,
e.g. __bobo_magic_attr__.
Prescriptive: Naming Conventions
Module Names
Module names can be either MixedCase or lowercase. There is no
unambiguous convention to decide which to use. Modules that export a
single class (or a number of closely related classes, plus some
additional support) are often named in MixedCase, with the module name
being the same as the class name (e.g. the standard StringIO module).
Modules that export a bunch of functions are usually named in all
lowercase.
Since module names are mapped to file names, and some file systems are
case insensitive and truncate long names, it is important that module
names be chosen to be fairly short and not in conflict with other
module names that only differ in the case -- this won't be a problem
on Unix, but it will be when the code is transported to Mac or
Windows.
There is an emerging convention that when an extension module written
in C or C++ has an accompanying Python module that provides a higher
level (e.g. more object oriented) interface, the Python module's name
CapWords, while the C/C++ module is named in all lowercase and has a
leading underscore (e.g. Tkinter/_tkinter).
"Packages" (groups of modules, supported by the "ni" module) generally
have a short all lowercase name.
Class Names
Almost without exception, class names use the CapWords convention.
Classes for internal use have a leading underscore in addition.
Exception Names
If a module defines a single exception raised for all sorts of
conditions, it is generally called "error" or "Error". As far as I
can tell, built-in (extension) modules use "error" (e.g. os.error),
while Python modules generally use "Error" (e.g. xdrlib.Error).
Function Names
Plain functions exported by a module can either use the CapWords style
or lowercase (or lower_case_with_underscores). I have no strong
preference, but believe that the CapWords style is used for functions
that provide major functionality (e.g. nstools.WorldOpen()), while
lowercase is used more for "utility" functions
(e.g. pathhack.kos_root()).
Global Variable Names
(Let's hope that these variables are meant for use inside one module
only.) The conventions are about the same as those for exported
functions. Modules that are designed for use via "from M import *"
should prefix their globals (and internal functions and classes) with
an underscore to prevent exporting them.
Method Names
Hmm, the story is largely the same as for functions. When using ILU,
here's a good convention: use CapWords for methods published via an
ILU interface. Use lowercase for methods accessed by other classes or
functions that are part of the implementation of an object type. Use
one leading underscore for "internal" methods and instance variables
when there is no chance of a conflict with subclass or superclass
attributes or when a subclass might actually need access to them. Use
two leading underscores (class-private names, enforced by Python 1.4)
in those cases where it is important that only the current
class accesses an attribute. (But realize that Python contains enough
loopholes so that an insistent user could gain access nevertheless,
e.g. via the __dict__ attribute. Only ILU or Python's restricted
mode will XXX