Twelve Thousand
Test Cases and Counting:
a Critique of Lightweight Methodologies in Python Program Development
Phil Pfeiffer
phil@etsu.edu
East TN State Univ.
PyCon 2004 / March 2004
George Washington
University
I. Basis for talk: case study in program development
starting problem:
a complex, unstable specification for a DB
context: resource
allocation and tracking system for ORNL CCS (22+ pp. of specs)
goal: limit risk
of instability with compiler
vision: combine
best of two approaches:
clarity, concision
of type definition-based data model
performance of
relational DB
plan: compile
data model to SQL
Part 1: type defn grammar ΰ create table queries
grammar:
dest-type, enum, union, tuple, map, set, include, reference
limitation: maps,
sets not nestable
Part 2: object defn grammar ΰ insert into table queries
object grammar: comparable to values grammar
Part 3: queries over objects ΰ queries over compiled tables
progress so far
coding started
April 2003
part 1 done
November 2003
part 2 to resume
in May 2003
II. Issue:
strategy for managing development?
classic
methodologies: pessimistic,
system-centric
vision: Build it so you can trust it. Then dont
trust it. [Meyer]
emphases:
defensive
design: I take care of abnormal cases
right away.
static
analysis (including static type
checking)
assertions (pre- and post-conditions)
systematic
testing (all equivalent inputs/ all program points/ etc.)
[cf.
Meyer, Bertrand, Practice to Perfect, IEEE Computer, May 1997]
lightweight
methodologies: optimistic, developer-centric
vision: shoot from hip with care; code will work
well enough, in time
emphases
simple design
do simplest thing
that works
tolerate
simplicity until intolerable, then revise
reasonable care
in coding
reasonable set
of test cases
just write test
cases, and youre done
add more as needed
[cf. Spring D.C. 2002 Python conference tutorial on
agile methods]
III. Initial
choice: go lightweight
strategy
use Python, SAX
code bottom up,
testing code as written
use reasonable
number of reasonably simple tests
hope to finish in
four months
rationale
praise for
Python, XML, SAX
Python portrayed
as easy to use, easy to learn
XML portrayed as
making language definition simple
SAX portrayed as
making parsing simple
apparent fit for
lightweight methods
lots of developer
experience
30+ years in
computing: assembler, OS coding, cross-platform development, network
programming, class library development
advanced work in
programming languages, including compilers
experience with
functional programming
no communication
overhead (solo development)
IV. One month
later: what hit me?
more code than
expected (1,800 SLOC, no end in sight)
coding slower,
code far buggier than expected
analysis
Python harder
than expected
interpreted OO
different from compiled OO
dynamic attribute
instantiation
overloading as
arglist-driven interpretation
templating as
metaprogramming
unfamiliar
idioms: self.f(); class.f(self); self.__class__; super(); etc.
unexpected
irregularities in Python: e.g.,
types without
evalable __repr__ methods (e.g., functions, classes, types)
uneven support
for introspection (e.g., getting fn hash from _getframe()?)
one-item tuples
vs. one-item lists
interpreted ≠ freeform
same old
structural tedium: copy, deepcopy, eq, ne, __repr__, etc.
SAX less helpful
than expected
sorting out
parsers took time [why all the
non-validating parsers?]
expat
documentation uneven, even misleading
for feature-rich
languages, code generationnot parsingstill key challenge
hit-and-miss
testing failing to eliminate errors
V. Rethinking
initial choice of methodology
starting point:
Cockburne, Agile Methods
methodologies
should focus on
developing
working software to a reasonable standard, and
enabling future
development
overhead
determined by project size, team size, working environment, quality of
communication, quality of tool set, and application criticality
beyond Cockburne:
need to carefully consider effect of
developer
expectations for code quality
absence of static
analysisstarting with typing
possible lack of
continuity between current, future development team
VI. Importance of
automated analysis: two views
classic
view: analysis :: logic ⇔ ECC :: communication
types,
declarations create redundancy in logic
redundancy
important for automated, static validity checking
exceptions
exist (e.g., ML), but Python isnt one
static checking
find errors
simple (but
common) errors, like misspellings, misordered parameters, use-before-def
errors, incomplete revisions
deeper errors
[Compiling]
supplies type errors, which in many cases reflect deeper oversights.
[Meyer]
lightweight
view: lack of typing, etc. as freedom
August 2003 C++
Users Journal article on Boost.Python
typing a
hindrance to rapid code development
Boost.Python
useful for avoiding overhead of type checking
initial reception
to work described in this talk K
VII. Critique
of static analysis as overhead
if your
development tools dont check your code, how do you manage error?
1.
ask clients to
accept error, permanently
2.
hope you can get
things right, immediately
3.
hope someone gets
things right, eventually
4.
go
heavyweight: improve quality of checking
by hand, with
test cases, and
dynamically
evaluated assertions
ή
consider each in turn.
VIII. #1. Quality too important to slough off
reasons for not
tolerating error: a personal view
ethical concerns
welfare of
student participants
welfare of
clients
(cf.: ACM/IEEE Code of Ethics; silver rule)
personal concerns
reputation
personal
standards
practical
concerns
wasted time
(using buggy code to debug bugs)
lessons of
history: American auto industry,
mid-1970s
IX. #2. Errors are too easy to make, and miss
reasons for
distrusting developers: cynical view
Goodkind,
Wizards First Rule
Ellisons two
most common substances
reasons, cont.:
humane view
coding under
adversity
out of sorts?
distracted? listening to music? confused? hurried?
revision dilemma
revisions, or no
revisions: both create problems
optimism dilemma
self-confidence
required for success as a programmer
positive attitude
runs counter to ability to critique code
And
I am always surprised (even though by now I should know better) when
the violated assertion turns out to be one that I had added for goodness sake,
so
convinced was I that it could never fail.
[Meyer]
ή Been there, done that, too often
X. #3. Well find it
eventually as source of risk
reasons for
distrusting feature first, quality later:
perception: point
of view assumes one of three givens:
eventual is
quickly, because code is small enough
the maρana
assumption:
important errors
will emerge, over time
someone will be
there to fix them
concern: all
assumptions problematic
without static checking,
just how small is small enough?
200 lines? 300 lines?
maρana risky for
walkaway
projects like Model-T (Im off duty May 04)
any project done
solo, or in understaffed organization
any project where
critical developers arent immortal K
XI. Rethinking
lightweight methods, concluded
reasons for
rejecting strategies 1, 2, 3
shoot from hip
quality control unsuitable for Model-T
novice language
difficulties and SAX issues, but
lack of static
checking judged primary risk, relative to
concerns about
quality, project size
needed: new
development strategy
Idea: keep
Python, but restore confidence by
restoring what
compiler brings to development
what remains:
embrace care (strategy 4)
strategy
4.1: introduce all points checking,
using
carefully crafted
gray box test suites, with
simpler test
tools than Python library provides
strategy
4.2: introduce type checking, using
hand-coded
assertions at key points in methods
supporting
library that strengthens built-in support for typing
XII. #4.1 Achieving compiler-strength checks with gray box testing
ideal: all
methods/all program points/all effects testing
nothing less
yields compiler-like coverage
strategy:
do the hard work
of coding test cases L, but as simply as we can K
key tricks:
define constructors
that optionally init all private attrs
benefit:
simplifies specifying assertions for test results
designing usable
gray box constructors
»
move private
attrs to right, or treat as named params
»
supply
intelligent default values
use declarative
testing tool, for concision
ή developed one, having
found none for Python
XIII. ADEPT (A Data-driven, Eval-based Program Tester)
supports
tuple-driven unit testing w. multi-level test stack with support for test
logging
idea: more
concise than class-based testing (cf. unittest doc)
package
description
ADEPT
proper: one .py file (adept.py)
support: summary doc, user manual, validation suite
(in ADEPT)
supported languages
Python: from
adept import *
C/C++: using Boost.Python (up to void *, which Boost
doesnt support)
supported test
types
Get, Set,
Get+Set, Null, Erroneous
properties
tests (Get, Set, Erroneous Get,
Erroneous Set)
test predicates
provided with
package: eqValue, eqRepr, eqContents,
containsRE, lacksRE, isInstanceOf, evalsTo
supports multi-level
and/or predicate trees for complex requirements
extensibility
designed for
user-defined predicate types, tests
http://csciwww.etsu.edu/phil (check
freeware link)
XIV. #4.2 Adding compiler-strength type checks
extend Python
typing predicates for subtyping: e.g.,
int is
supertype of
posInt ⇒ 3 Ξ posInt
int
is supertype of
posInt ⇏
-3 Ξ posInt
⇒
use success of posInt(3), failure of
posInt(-3) to infer difference
derive
self-typing subclasses of int, str, tuple, list, dict: e.g.,
isOfType(
HomgenList(int, [1, 2, 3] )) == true
isOfType(
HomgenList(int, [1, 2, 3] )) == false
isOfType(
HomgenList( (int,str), [1, 2, 3] )) == true
support typing
with partially instantiated types: e.g.,
intListType
= AsType( (HomgenList, int), list of integer )
isOfType(
intListType, [1, 2, 3] ) == true
isOfType( intListType, [1, 2, 3] ) == false
insert type
checks into codes in two places:
on entrance to
method calls
on return from
method calls that return complex results
XV. PyRite
type library (key features)
predicates for
subtype checking
isOfType(t,
v):
is item v an instance of a subtype of t (including t)?
classes for
parameter-based subtyping of Py built-ins
IntSubrangeValue,
StrSubrangeValue:
HomgenTuple,
HetgenTuple:
constructed by
friend functions, w. attrs that define relevant constraints
HomgenList,
HetgenList:
constructors
accept constraints for indices, values, index/value pairs
HomgenDict,
HetgenDict:
constructors
accept constraints for keys, values, key/value pairs
classes for
parameter-based subtyping of other classes:
HomgenSet,
HetgenSet:
constructors
accept constraints for keys, values, key/value pairs
set code leftover
from v2.2, which had no built-in set classes
ManyOneHomgenDict,
ManyOneHetgenDict:
many-one
dict: dict that supports aliasing among
keys
constructors
accept constraints for keys, values, key/value pairs
XVI. In
the final analysis: how the strategy worked
cost
compiler on hold,
late Aprilearly July (ADEPT / PyRite)
test cases
difficult to generate, even with tools
simple cases (e.g., __eq__, __ne__) mind-numbingly dull
complex cases (e.g., symbol table classes) mind-bending
peaked at 12,000+ before mid-November revision (see below)
benefit
full-featured
phase 1 compiler done by early Nov.
allows out of
order type definitions
thorough anomaly
checking, including unknown mySQL types;
missing and circular defns; dependencies on bad defns
compiles through
errors: compiles all types flagged as sound
supports
retargetable back end
test cases vital
for two major post-Nov. overhauls
goal: use
metaprogramming to eliminate repetitious code
outcome:
6,000 SLOC
(exclusive of ADEPT, PyRITE) ⇒ 3,000 lines
9,000 test cases
(exclusive of ADEPT, PyRITE) ⇒ 6,000 test cases
XVII. Python: Id use it
again, gladly
negatives of
losing static analysis are real
Why
would anyone want to use an untyped or dynamically typed language? [W]ell develop faster that way makes no
sense to me... [Meyer]
but static
analysis isnt complete, anyway
all-points
testing important for QC, regardless of language
typing doesnt
catch everything
testing type
assertions a significant, but small, part of testing
__eq__ / __ne__
tests, __repr__ tests, etc., dont go away
complex tests
dont go away
and (Python) interpretation
has its plusses.
fast feedback a
major plus
metaprogramming
wonderful for trimming duplicate code
subject-oriented
metaprogramming improves code quality
upper-layer
classes create expected features in lower classes at load time
example: codegen
layer adds mySQL codegen methods to AST classes
benefits: simplifies design, testing of lower-layer
logic
you dont need to
fight with the JVM J
XVIII.
but it would be good to have:
solid documentation on how to metaprogram in Python (what happens to
on-the-fly class and function creation now that new is deprecated?)
a tool for
semi-automated all points test case generation, driven by static
analysis of Python code (even if imperfect!)
a standard,
declarative-style test suite driver (improved ADEPT?)
standard,
self-typing versions of standard classes (improved PyRITE?)
a decent library
subtyping predicate
(assumption: derived classes defined from base classes via narrowing)
object.__ne__(self,
*args,**dict) ≜ not object.__eq__(self, *args,**dict)
(might affect tetralemma-based theorem provers☺ -- but
asymmetry would simplify careful testing)
int.__init__(self,
*args,**dict) [and similarly for all
immutables]
(would simplify dynamic instantiation of
subtypes for immutable types, by supporting creation of curried constructors
that capture dynamic constraints, via additional constructor params)
XIX. Afterward:
thanks to
the organizers of
PyCon 2004
you who are
here
(for hanging around for the last talk on a Thursday afternoon)
Alex Martelli
(for Python Cookbook, Python in a Nutshell
and polite answers to early, stupid questions about Python)
Smitha Chennu
(for help with testing Model-T under mySQL)
Dr. Stephen Scott
/ Dr. Al Geist of ORNL
(for being patient while I worked all this out)
my wife, Linda
(for being really patient while I worked this out)
X. Selected References
Cockburne,
Alastair, Agile Methods
Halberstam,
David, The Reckoning
(Ford vs. Nissan in the 70s: a parable for contemporary American software
development)
Harrison, Wm., and
Ossher, Harold, Subject-Oriented Programming (A Critique of Pure Objects),
OOPSLA 93
Meyer, Bertrand,
Practice to Perfect: The Quality First Model, IEEE Computer, May 1997
Yourdon, Edward, Decline
and Fall of the American Programmer