Test Runners

Let’s start by looking at what is a test-runner. We need to differentiate a test-runner from a test-framework.

From python documentation: A test runner is a component which orchestrates the execution of tests and provides the outcome to the user. The runner may use a graphical interface, a textual interface, or return a special value to indicate the results of executing the tests.

A test-framework defines an API that is used to write the tests. For example the unittest module from python’s stdlib.

The unittest module also defines an API for creating test runners, and provides a basic test runner implementation. Apart from this, there are other runners that has support for running unittest‘s tests like nose and py.test.

On the other hand py.test defines its own test-framework but only its own runner is capable of running its tests.

Test Runner Features

When using a test runner interactively in the command line I expect two main things:

  • fast feedback: I want to know as fast as possible if my latest changes are OK (tests are executed successfully or not).
  • in case of test failures, how easy it is to find the problem.

Note that I said fast feedback not faster tests. Of course the actual test (and not the test runner) plays the main role in the time to execute a test. But you should not always need to wait until all tests are executed before getting some feedback.

For example py.test offers the option -x/–exitfirst and –maxfail, to display information on failures without waiting for all tests to finish. Also check the instafail plugin.

Another way to achieve faster feedback is by executing just a sub-set of your tests. Using py.test, apart from selecting tests from just a package or module, also has a very powerful system to select tests based on keywords using the option -k. While this feature is extremely useful it

py.test has great tools to help you debug failed tests like colorful output, pdb integration, assertion re-write and control of traceback verbosity.

Importance of Test Ordering

By default py.test will execute tests grouped by test module. Modules are ordered alphabetically, tests from each module are executed in the order they are defined. Although unit-tests should work when executed in any order, it is important to execute them in a defined order so failures can be easily reproduced.

But using a simple alphabetical order that does not take into account the structure of the code has several disadvantages.

  1. To achieve faster feedback it is important that the most relevant tests should be executed first. Using alphabetical order you might spend a long time executing tests that were not affected by recent changes, or execute tests that have little chance to fail.
  2. It is common that a single change break several tests. In order to easily identify the cause of the problem it is important to look at the test that is directly testing the point where the code changed. It might not be easy to pin-point the problem when looking at failures in tests that were broken but not directly test the problem. Executing the most relevant tests first you could make sure to get direct failures first.

How to order tests

There are two main factors to determine a most relevant order to execute tests.

  • The source code inter-dependency structure (this is done by analyzing the imports)
  • Modified modules since last successful execution

Lets look at simple example project that contains four modules, and each module a corresponding test module. Look at the imports graph below where an edge bar -> util means that bar.py imports util.py.

digraph imports {
 rankdir = BT
 util [shape=box, color=blue]
 bar [shape=box, color=blue]
 foo [shape=box, color=blue]
 app [shape=box, color=blue]

 "bar" -> "util"
 "foo" -> "util"
 "app" -> "bar"
 "app" -> "foo"

 "test_util" -> "util"
 "test_bar" -> "bar"
 "test_foo" -> "foo"
 "test_app" -> "app"
}

Initial (full) run

On the first run all tests must be executed. Since bar and foo depends on util, we want to execute test_util first to make sure any problems on util are caught first by its direct tests on test_util.

The same applies for app in relation to foo and bar. foo and bar both have the same level in the structure, so they are just ordered in alphabetically.

So we execute tests in the following order:

test_util, test_bar, test_foo, test_app

incremental run - test modified

Now let’s say that we modify the file test_foo. We know that all tests were OK before this modification, so the most relevant tests to be execute are on test_foo itself.

Not only test_foo should be executed first, all other tests do not need to be executed at all because a change in test_foo does not affect any other tests since no other module depends (imports) test_foo.

digraph imports {
 rankdir = BT
 util [shape=box, color=blue]
 bar [shape=box, color=blue]
 foo [shape=box, color=blue]
 app [shape=box, color=blue]

 "bar" -> "util"
 "foo" -> "util"
 "app" -> "bar"
 "app" -> "foo"

 "test_util" -> "util"
 "test_bar" -> "bar"
 "test_foo" -> "foo"
 "test_app" -> "app"

 test_foo [color=red, fontcolor=red, style=filled, fillcolor=yellow]
}

The same behavior can be observed for a change in any other test module in this example. Since there are not dependencies between test modules, a change in a test module will require the execution only of the modified module.

incremental run - source modified

Let’s check now what happens when foo is modified. Looking at the graph it is easy to see which tests are going to be affected.

digraph imports {
 rankdir = BT
 util [shape=box, color=blue]
 bar [shape=box, color=blue]
 foo [shape=box, color=blue]
 app [shape=box, color=blue]

 "bar" -> "util"
 "foo" -> "util"
 "app" -> "bar"
 "app" -> "foo" [color=red]

 "test_util" -> "util"
 "test_bar" -> "bar"
 "test_foo" -> "foo" [color=red]
 "test_app" -> "app" [color=red]

 foo [fontcolor=red, color=red]
 app [color=red]
 test_foo [color=red, style=filled, fillcolor=yellow]
 test_app [color=red, style=filled, fillcolor=yellow]
}

The order of test execution is test_foo then test_app. Other tests are not executed at all.

Analyzing the graph is easy to see that a change in app would cause only test_app to be execute. And a change in util would cause all tests to be executed.

pytest-incremental

Hopefully by now it is clear that by taking in account the structure of the code to order the tests, the test-runner can:

  • reduce total execution time for incremental changes
  • get faster feedback by executing first the tests which have direct code under test changes
  • easier to debug test failures because of more relevant test ordering

pytest-incremental is a py.test plugin that analyses the source code and changes between runs to re-order and de-select tests cases.

caveats

pytest-incremental looks for imports recursively to find dependencies (using AST). But given the very dynamic nature of python there are still some cases that a module can be affected by a module that are not detected.

  • modules imported not using the import statement
  • modules not explicitly imported but used at run-time
  • monkey-patching. (i.e. A imports X. B monkey-patches X. In this case A might depend on B)
  • others ?

cyclic dependencies

If your project has dependency cycles will negatively affect the efficacy of pytest-incremental. Dependency cycles are bad not only for pytest-incremental, it makes the code hard to understand and modify. pytest-incremental does not try to be smart handling it, so you better fix your code and remove the cycles!

digraph imports {
 rankdir = BT
 util [shape=box, color=blue]
 bar [shape=box, color=blue]
 foo [shape=box, color=blue]
 app [shape=box, color=blue]

 "bar" -> "util"
 "foo" -> "util"
 "app" -> "bar"
 "app" -> "foo"
 "util" -> "app"
 "bar" -> "app"
}

When you have cycles any change end up affecting all modules!