.. _motivation: Test Runners ============ Let's start by looking at what is a test-runner. We need to differentiate a **test-runner** from a **test-framework**. From python `documentation `_: A test runner is a component which orchestrates the execution of tests and provides the outcome to the user. The runner may use a graphical interface, a textual interface, or return a special value to indicate the results of executing the tests. A test-framework defines an API that is used to write the tests. For example the `unittest` module from python's stdlib. The `unittest` module also defines an API for creating test runners, and provides a basic test runner `implementation `_. Apart from this, there are other runners that has support for running `unittest`\'s tests like `nose` and `py.test`. On the other hand `py.test` defines its own test-framework but only its own runner is capable of running its tests. Test Runner Features ==================== When using a test runner interactively in the command line I expect two main things: - fast feedback: I want to know as fast as possible if my latest changes are OK (tests are executed successfully or not). - in case of test failures, how easy it is to find the problem. Note that I said **fast feedback** not faster **tests**. Of course the actual test (and not the test runner) plays the main role in the time to execute a test. But you should not always need to wait until **all** tests are executed before getting some feedback. For example *py.test* offers the option `-x/--exitfirst` and `--maxfail`, to display information on failures without waiting for all tests to finish. Also check the `instafail plugin `_. Another way to achieve faster feedback is by executing just a sub-set of your tests. Using *py.test*, apart from selecting tests from just a package or module, also has a very powerful system to select tests based on *keywords* using the option `-k`. While this feature is extremely useful it *py.test* has great tools to help you debug failed tests like colorful output, *pdb* integration, assertion re-write and control of traceback verbosity. Importance of Test Ordering =========================== By default *py.test* will execute tests grouped by test module. Modules are ordered alphabetically, tests from each module are executed in the order they are defined. Although unit-tests *should* work when executed in any order, it is important to execute them in a defined order so failures can be easily reproduced. But using a simple alphabetical order that does not take into account the structure of the code has several disadvantages. #. To achieve faster feedback it is important that the most **relevant** tests should be executed first. Using alphabetical order you might spend a long time executing tests that were not affected by recent changes, or execute tests that have little chance to fail. #. It is common that a single change break several tests. In order to easily identify the cause of the problem it is important to look at the test that is directly testing the point where the code changed. It might not be easy to pin-point the problem when looking at failures in tests that were broken but not directly test the problem. Executing the most **relevant** tests first you could make sure to get *direct* failures first. How to order tests ================== There are two main factors to determine a most relevant order to execute tests. - The source code inter-dependency structure (this is done by analyzing the *imports*) - Modified modules since last successful execution Lets look at simple example project that contains four modules, and each module a corresponding test module. Look at the **imports** graph below where an edge ``bar -> util`` means that ``bar.py`` imports ``util.py``. .. graphviz:: digraph imports { rankdir = BT util [shape=box, color=blue] bar [shape=box, color=blue] foo [shape=box, color=blue] app [shape=box, color=blue] "bar" -> "util" "foo" -> "util" "app" -> "bar" "app" -> "foo" "test_util" -> "util" "test_bar" -> "bar" "test_foo" -> "foo" "test_app" -> "app" } Initial (full) run ------------------ On the first run all tests must be executed. Since ``bar`` and ``foo`` depends on ``util``, we want to execute ``test_util`` first to make sure any problems on ``util`` are caught first by its direct tests on ``test_util``. The same applies for ``app`` in relation to ``foo`` and ``bar``. ``foo`` and ``bar`` both have the same *level* in the structure, so they are just ordered in alphabetically. So we execute tests in the following order:: test_util, test_bar, test_foo, test_app incremental run - test modified ------------------------------- Now let's say that we modify the file ``test_foo``. We know that all tests were OK before this modification, so the most relevant tests to be execute are on ``test_foo`` itself. Not only ``test_foo`` should be executed first, all other tests do not need to be executed at all because a change in ``test_foo`` does not affect any other tests since no other module depends (imports) ``test_foo``. .. graphviz:: digraph imports { rankdir = BT util [shape=box, color=blue] bar [shape=box, color=blue] foo [shape=box, color=blue] app [shape=box, color=blue] "bar" -> "util" "foo" -> "util" "app" -> "bar" "app" -> "foo" "test_util" -> "util" "test_bar" -> "bar" "test_foo" -> "foo" "test_app" -> "app" test_foo [color=red, fontcolor=red, style=filled, fillcolor=yellow] } The same behavior can be observed for a change in any other test module in this example. Since there are not dependencies between test modules, a change in a test module will require the execution only of the modified module. incremental run - source modified --------------------------------- Let's check now what happens when ``foo`` is modified. Looking at the graph it is easy to see which tests are going to be affected. .. graphviz:: digraph imports { rankdir = BT util [shape=box, color=blue] bar [shape=box, color=blue] foo [shape=box, color=blue] app [shape=box, color=blue] "bar" -> "util" "foo" -> "util" "app" -> "bar" "app" -> "foo" [color=red] "test_util" -> "util" "test_bar" -> "bar" "test_foo" -> "foo" [color=red] "test_app" -> "app" [color=red] foo [fontcolor=red, color=red] app [color=red] test_foo [color=red, style=filled, fillcolor=yellow] test_app [color=red, style=filled, fillcolor=yellow] } The order of test execution is ``test_foo`` then ``test_app``. Other tests are not executed at all. Analyzing the graph is easy to see that a change in ``app`` would cause only ``test_app`` to be execute. And a change in ``util`` would cause all tests to be executed. pytest-incremental ================== Hopefully by now it is clear that by taking in account the structure of the code to order the tests, the test-runner can: - reduce total execution time for incremental changes - get faster feedback by executing first the tests which have direct code under test changes - easier to debug test failures because of more relevant test ordering ``pytest-incremental`` is a *py.test* plugin that analyses the source code and changes between runs to re-order and de-select tests cases. caveats ======= ``pytest-incremental`` looks for imports recursively to find dependencies (using AST). But given the very dynamic nature of python there are still some cases that a module can be affected by a module that are not detected. * modules imported not using the *import* statement * modules not explicitly imported but used at run-time * monkey-patching. (i.e. A imports X. B monkey-patches X. In this case A might depend on B) * others ? cyclic dependencies ------------------- If your project has dependency cycles will negatively affect the efficacy of *pytest-incremental*. Dependency cycles are bad not only for *pytest-incremental*, it makes the code hard to understand and modify. *pytest-incremental* does not try to be smart handling it, so you better **fix** your code and remove the cycles! .. graphviz:: digraph imports { rankdir = BT util [shape=box, color=blue] bar [shape=box, color=blue] foo [shape=box, color=blue] app [shape=box, color=blue] "bar" -> "util" "foo" -> "util" "app" -> "bar" "app" -> "foo" "util" -> "app" "bar" -> "app" } When you have cycles any change end up affecting all modules!