QMTest is an open-source, general-purpose, cross-platform software testing tool written entirely in Python [Python]. The problem of testing can be divided into a domain-dependent problem and a domain-independent problem. QMTest is intended to solve the domain-independent part of the problem and to offer a convenient, powerful, and flexible interface for solving the domain-dependent problem. The choice of Python as an implementation language has improved the user experience and simplified the development of QMTest; the use of Python is fundamental to the success of QMTest.
QMTest is an open-source, general-purpose, cross-platform software testing tool written entirely in Python. QMTest can be used to test compilers, databases, graphical user interfaces, or embedded systems. QMTest provides a convenient graphical user interface for creating, managing, and executing tests, provides support for parallel test execution, and can be extended in a variety of ways.
The central principle underlying the design of QMTest is that the problem of testing can be divided into a domain-dependent problem and a domain-independent problem. The domain-dependent problem is deciding what to test and how to test it. For example, should a database be tested by performing unit tests on the C code that makes up the database, or by performing integration tests using SQL queries? How should the output of a query asking for a set of records be compared to expected output? Does the order in which the records presented in matter? These are questions that only someone who understands the application domain can answer.
The domain-independent part of the problem is managing the creation of tests, executing the tests, and displaying the results for users. For example, how does a user create a new test? How are tests stored? Should failing tests be reported to the user, even if the failure was expected? These questions are independent of the application domain; they are just as relevant for compiler tests as they are for database tests.
QMTest is intended to solve the domain-independent part of the problem and to offer a convenient, powerful, and flexible interface for solving the domain-dependent problem. QMTest is both a complete application, in that it can be used "out of the box" to handle many testing domains, and infrastructure, in that it can be extended to handle other domains.
Users of QMTest can use either a command-line interface, or a web-based graphical interface. Using either interface, users can, with a single command, run some or all of the tests in a test database, and obtain a summary report indicating which tests passed and which tests failed. Optionally, users can obtain more detailed reports containing domain-dependent data about the test results. When using the graphical interface, users can also create new tests by filling in a form.
The development of QMTest is part of the second phase of the Software Carpentry [Wilson99] project, sponsored by the Los Alamos National Laboratory (LANL). That project originally set out to design and implement Python-based tools for testing, issue tracking, build management, configuration. The design phase was accomplished via a contest that accepted submissions and awarded prizes for winning submissions.
The first section of this paper discusses the design of QMTest, with particular focus on the extension facilities that it provides, and the implementation of those facilities in Python. The second section discusses practical experience with QMTest to date. The third section discusses some of the ways in which the choice of Python as an implementation language has improved the user experience and simplified the development of QMTest. The fourth and final section discusses some possible future directions for QMTest.
In general, extensibility is a fundamental goal of the QMTest architecture. QMTest, like any testing tool, must allow users to test their own application domains. But, QMTest also allows extensibility in several other areas: users can control how tests are stored, how test execution is scheduled across available machine resources, and how results are displayed. These four axes -- domain, storage, schedule, and display -- are all orthogonal in QMTest. QMTest can be extended along one axis without making any changes along the other axes.
When discussing QMTest, it is useful to distinguish between two classes of users. Some users create tests, execute those tests, and look at the results. Other users extend QMTest, providing support for new application domains, new methods of test storage, and so forth. Henceforth, the first class is referred to simply as users, while the second is referred to as extenders.
All QMTest extensions are implemented in Python. In fact, QMTest's default behaviors are all implemented using the same mechanisms that extenders use to create QMTest extensions. It would be as fair to say that QMTest comes with some prepackaged extensions as it is to say that QMTest has default behaviors.
In the following subsections, each axis is described, and possible extensions are described.
A test class contains machinery for executing a test, deciding whether it passed or failed, and reporting the results of the test. Every test class takes arguments that parameterize its behavior. A test is an instance of a test class, i.e., a test class together with arguments to that test class.
QMTest comes with a number of test classes to handle typical testing situations. For example, QMTest comes with a test class (called ExecTest) that spawns a process and checks its exit status. If the command exits with a zero exit status, the test passes; otherwise, the test fails. Every test class takes arguments that parameterize its behavior; for example, ExecTest takes parameters that indicates the name of the program to spawn and the command-line arguments that should be provided to it.
In theory, ExecTest is capable of testing any domain. Any test can be written as an executable that contains all the testing logic and then exits successfully if (and only if) the test passes. Of course, trying to perform testing using this crude method is painful in a variety of respects. First and foremost, it is not necessarily convenient to create separate executables that contain testing logic. In addition, there is no good way for the executable to report information about why a test failed. Nor is there a good way for QMTest to assist users in creating new tests, since creating a new test requires creating an entire new executable.
To solve these problems, an extender may create a custom test class. A test class is a Python class that implements a particular interface. The interface consists of one method and one attribute:
This attribute describes the arguments to the test class. QMTest uses this information to prompt the user for arguments when creating a new test through the graphical user interface. This information is also used when a test is written out to permanent storage.
This method is responsible for executing the test. It takes one parameter, called context. This parameter provides information about how the test should be executed that depends on the environment in which the test is being executed. For example, when testing a compiler, the user might specify the path to the compiler by placing information into the context.
The Run method returns an object that indicates whether the test passed or failed. The Run method can return arbitrary additional data in the result, indicating, for example, why the test failed or how long it took to run the test.
It is important to note that this interface does not specify a storage format for the test. In QMTest, test storage is orthogonal to the domain tested. Tests written using new test classes can still be stored by QMTest; the default storage mechanism simply records the values of all arguments provided to the test class.
The code for QMTest's simplest built-in test class is presented here. This test class executes a Python express and passes if the expression evaluates to true:
class ExecTest(Test): """Check that a Python expression evaluates to true. An 'ExecTest' test consists of Python source code together with an (optional) Python expression. First the Python code is executed. If it throws an (uncaught) exception the test fails. If the optional expression is present, it is then evaluated. If it evaluates to false, the test fails. Otherwise, the test passes.""" arguments = [ qm.fields.TextField( name="source", title="Python Source Code", description="""The source code. This code may contain class definitions, function definitions, statements, and so forth. If this code throws an uncaught exception, the test will fail.""", verbatim="true", multiline="true", default_value="pass" ), qm.fields.TextField( name="expression", title="Python Expression", description="""The expression to evaluate. If the expression evaluates to true, the test will pass, unless the source code above throws an uncaught exception. If this field is left blank, it is treated as an expression that is always true.""", verbatim="true", multiline="true", default_value="1" ) ] def __init__(self, **properties): apply(Test.__init__, (self,), properties) # Store stuff for later. if self.source is None: self.source = "" else: # Make sure the source ends with a newline. A user is # likely to be confused by the error message if it's # missing. if self.source[-1] != "\n": self.source = self.source + "\n" def Run(self, context, result): global_namespace, local_namespace = make_namespaces(context) # Execute the source code. try: exec self.source in global_namespace, local_namespace except: # The source raised an unhandled exception, so the test # fails result.NoteException(cause="Exception executing source.") else: # The source code executed OK. Was an additional expression # provided? if self.expression is not None: # Yes; evaluate it. try: value = eval(self.expression, global_namespace, local_namespace) except: # Oops, an exception while evaluating the # expression. The test fails. result.NoteException(cause= "Exception evaluating expression.") else: # We evaluated the expression. The test passes iff # the expression's value is boolean true. if not value: result.Fail("Expression evaluates to false.", { "ExecTest.expr" : self.expression, "ExecTest.value" : repr(value) }) else: # No expression provided; if we got this far, the test # passes. pass
By default, QMTest represents each test as an Extensible Markup Language (XML) [XML] file. For each test, QMTest stores the name of the Python test class and representations of each of the arguments to the class. This representation is fully general. XML is a good format to use in that it is rigorously defined, standardized, and well supported in most programming languages. (For example, it is easy to mechanically generate QMTest files from a tool written in C++ by emitting XML.) In addition, the fact that XML is a textual format makes it easy to store in any revision control system.
While XML is a textual format, it is not always easy for people to read and edit. XML is also somewhat bulky; in some cases, the XML tags can consume large amounts of space. Encoding binary data in XML requires converting it to a textual format. Therefore, the efficiency of QMTest can be adversely impacted by the cost of encoding and decoding binary data in the test files. For some applications, a different storage format can overcome these problems.
For example, it might be more efficient to store large testsuites in a relational database, rather than as files in the filesystem. Users could then perform operations on the tests that QMTest does not support directly, such as the retrieval of all tests created after a particular date.
In situations such as these, extenders can create a customized test database class. A test database is an instance of such a class. The test database is responsible for storing and retrieving tests; it maps test identifiers to the tests themselves. How it performs its task is entirely up to the implementor of the test database class.
Converting an existing testsuite to QMTest can also be accomplished by means of a customized test database class. The test database class can parse the existing test format and create test instances that QMTest can execute. When using this approach, there is no need to convert each individual test to QMTest's XML format.
Test databases can be used in ways that do not directly relate to storage. For example, QMTest comes with a database class that can combine multiple test databases into a single database. This database class is useful for testing of a number of components at once, each of which has its own test database. For example, an operating system distributor must integrate several hundred packages to create a GNU/Linux distribution. If these packages had their own QMTest test databases, the distributor could easily run all of the tests for all of the packages and get a single report on the status of the complete system.
The user of QMTest does not need to understand how tests are stored. For example, QMTest's graphical user interface will appear exactly the same to a user who uses the default test database and to a user who uses a test database built atop a relational database. From the user's point of view, creating a new test is simply a matter of choosing a test class and then providing arguments to that class. [1]
The default QMTest execution engine executes tests sequentially on a single machine. However, QMTest also comes with an execution engine that is capable of spreading the test load across a large testing farm. This feature is particularly useful to organizations that have access to supercomputers, or large clusters of machines, but it is also useful to developers who have a symmetric multiprocessor on their desks.
Scheduling jobs, especially on machines with varying loads and varying resource characteristics, is a notoriously difficult problem. In addition, the means used to connect to a remote machine varies from environment to environment; while rsh is standard on many systems, it is not available under Microsoft Windows, and even on UNIX systems it may be more efficient to use the Message Passing Interface (MPI), threads, or some other means of obtaining parallelism. Therefore, QMTest allows extenders to override the default parallel execution engine with a customized version.
The execution engine need not be aware of the test class implementation or storage format; it is simply responsible for allocating test instances to machines. Therefore, it is possible for the administrators of a high-performance supercomputer to provide a test execution engine that can then be used by all users on the machine.
Every scheduler must obey certain ordering constraints. In particular, users may test may specify prerequisite tests for a test. A test with prerequisite tests is called a dependent test. A dependent test can only be executed after its prerequisites have been executed. Users can associate an outcome with the prerequisite test. The dependent test is only executed if the prerequisite test has the specified outcome.
Prerequisites can be used to avoid running complex tests if they are almost certain to fail. For example, if a web browser crashes when trying to display blank HTML page, it is unlikely to successfully render a complex page with lots of tables. By making the simple test a prerequisite for the complex test, and by specifying that the simple test must pass, the user can avoid wasting cycles running the complex tests if the application is severely broken.
Prerequisites can also be used to attempt to diagnose failures in more detail. Consider, for example, a complex physical simulation that consists of several independent components. One test might check the behavior of the entire system. If, however, this test fails it may make sense to run unit tests on the component pieces in an effort to discern where the failure lies. The user would then make the complex test a prerequisite to the simple tests, and specify that the complex test must fail before the dependent tests are run. Then, QMTest will run the unit tests only if the integration test fails.
In some situations, multiple tests may require common setup and cleanup code. For example, one approach to testing a database is to populate it with a relatively large data set and then run multiple queries against the database. When all of the queries are complete, the database should be destroyed. QMTest resources can be used for this purpose. A resource is an object with SetUp and CleanUp methods. When a test depends on a resource, it is guaranteed that the resource's SetUp method will execute before the test is executed, and that the CleanUp method will execute afterward. If multiple tests depend on the same resource, the scheduler may share the resource among all of the tests, rather than performing the setup and cleanup once for each test.
QMTest can produce reports about test execution in a number of different formats. For example, QMTest can simply produce a report that indicates which tests passed and which tests failed. Alternatively, QMTest can indicate how the tests behaved relative to some previous test run. This mode of output is useful in determining whether some recent changes have caused new problems; if the tests that failed also failed the last time the tests were run, the new changes are likely not the cause of the problems.
In many situations, just as interesting as knowing that a test failed is knowing why it failed. For example, when testing a compiler, did the test fail because the compiler crashed, because it produced an incorrect error message, or because the compiler generated incorrect code? Even if the test did not fail, there may be other interesting information associated with the test execution, such as the amount of time the test took to execute.
As discussed above, the test class can embed this information in the result object that it returns to QMTest. Then, QMTest will display this information for the user. If the test class wishes to control not only the information displayed, but also its formatting, the test class can provide a method that formats a result object for output. In that way, the test class can, for example, provide hyperlinks from information embedded in the failure description to a manual that explains the likely causes of the failure.
By default, QMTest stores results as XML, with the goal that other applications may be written to process the results and display them in interesting ways. For example, it is likely that a future version of QMTest will contain a script that takes a series of results files and produces a Microsoft Excel spreadsheet showing test failures over time. However, users can replace the module that performs results storage in the same way that they can replace display modules, so that, for example, results can be stored directly in a relational database.
[1] | Sometimes, this abstraction barrier is not as clean as one would like. For example, the user may have to use a revision control system to coordinate test suite development with other users when using the default database implementation, but may not need to when using a customized implementation. |
At the time of writing, QMTest has not yet been officially released. (The first release of QMTest is expected by the end of 2001.) However, prerelease versions of QMTest are available for download at Software Carpentry. At this time, QMTest has been used in three software development projects: QMTest itself, the POOMA numerical computing toolkit, and the GNU Compiler Collection (GCC).
In each of these cases, QMTest has proved a useful and convenient tool, modulo the growing pains that come with using any prerelease software: modifications to file formats, incomplete documentation, and a changing user interface. It is informative to contrast the way in which QMTest has been deployed in these situations.
QMTest is used to test QMTest itself, as well as QMTrack, its companion bug-tracking tool. These tests are primarily unit tests. Rather than exercising the complete application, they focus on testing the Python modules that make up QMTest and QMTrack. These tests are run by every developer before committing changes to the QMTest and QMTrack source repository.
It is, of course, especially convenient to test Python code from within QMTest. Loading modules, invoking methods, and examining the results are all extremely natural. The primary difficulty with the QMTest test suite is that it is incomplete; more tests should be written.
POOMA (Parallel Object Oriented Methods and Applications) [POOMA] is a high-performance numerical toolkit created at the LANL, and now co-maintained by CodeSourceryand many of the original POOMA developers. POOMA is written entirely in C++. Its goal is to perform efficient computations on massively parallel machines, such as the Accelerated Strategic Computing Initiative (ASCI) [ASCI] Blue Mountain machine [BlueMountain], which is composed of several thousand Origin 2000 processors.
The POOMA developers had written a variety of unit tests, application tests, example programs, and benchmarks. Although there was a Makefile for each tests, there was no automated system for building and running all of these tests at once. Nor was there a convenient mechanism for displaying the results. Without using any extensions to QMTest, it was possible to create tests that invoked make to build each of the tests, executed the resulting binaries, and compared the results to expected values.
Undertaking this effort demonstrated that many of the tests were suffering from a lack of maintenance; many failed to compile, and others failed to execute correctly. By using QMTest on a nightly basis, the developers are now able to prevent further decay.
A notable advantage of QMTest in this situation was its ability to work with an existing testsuite. As a result, QMTest was deployed with relatively little effort. There was no need to modify the existing practices of the developers, such as the locations in which tests were placed, or the way in which they were written.
Like POOMA, GCC [GCC] has a large body of existing tests. The DejaGNU [DejaGNU] test harness is used to execute those tests, and is, in general, satisfactory. However, it seemed interesting to try to execute the same tests with QMTest, in order to see whether QMTest could be easily adapted to an existing testsuite. While it should be possible to use the entire testsuite, QMTest has thus far been used only with the C++ portion of the testsuite.
The GCC tests are represented as individual C++ source files. These source files contain special comments indicating where error messages should be emitted, and whether the test should be executed, or merely compiled. In order to efficiently execute these tests with QMTest, both a custom test class and a custom test database were required.
The only argument to the test class is the name of the source file. The Run method of the test class is responsible for reading the file and extracting the special comments. Then, the C++ compiler is invoked, and its standard error stream captured. The error stream is parsed into distinct error messages, and checked against the expected errors. If the test program is supposed to be executed (and not just compiled) the resulting binary is executed, and its exit code checked.
The test class is able to return interesting information about test failures. For example, it records which error messages were spurious (i.e., appeared, but should not have) and which were missing (i.e., did not appear, but should have). This information is available in structured form in the XML results file written out by QMTest.
Because the tests are represented as C source files, rather than in QMTest's default XML format, a custom test database class was used to read and write tests.
The use of QMTest provides two major advantages:
Unlike DejaGNU, QMTest can execute tests in parallel. As a result, it is possible to run the test suite considerably faster with QMTest than with DejaGNU on a multiprocessor.
QMTest's graphical user interface can be used to execute one or many tests, display the results of those tests, and create new tests.
DejaGNU is written in Tcl. Some of the GCC tests contain embedded Tcl commands. These tests are not easy to convert for use with QMTest. In theory, the custom test class could have invoked a Tcl interpreter, but setting up an appropriate context would have been difficult. It would be easier to modify the tests themselves.