Advanced Testing and Determinism

If you care about software reliability, you should care about testing.

Unfortunately, most discussions about testing online take the form of tiresome discussions about:

TDD and whether you should test first or last
The exact delineations between unit, integration and end to end tests
Whether static typing (a form of static analysis) is a replacement for testing (a form of dynamic analysis)

I no longer think these discussions are important or interesting. I will argue here that the most important thing about testing is separating deterministic from non-deterministic code.

Deterministic VS Non-deterministic code.

A Deterministic program is one that - given the same inputs, will always produce the same outputs. Every single time.

Some common things that break this property are:

user IO
network IO
disk IO
threads and async runtimes
RNGs that aren't seeded by the user
many hash table implementations (as a consequence of the above)

There are many different patterns for separating deterministic and non-deterministic code. It's widely regarded as a Good Idea across many different paradigms and communities. And like most good ideas in computing, techniques to separate deterministic and non-deterministic code have been independently invented several times. Here is a non-exhaustive list:

Dependency Injection
Functional Core, Imperative Shell
Hexagon Architecture (Ports & Adapters)
IO Monad
Mock Objects
Deterministic Simulation Testing

We see that these techniques come from different tech communities - systems programmers, functional programmers and object-oriented programmers. All cite better testability as a side effect of these techniques.

(Brief note: deterministic is a broader concept than purely functional; you're free to mutate memory as long as you're the only one who has access to it. In Haskell terms the ST monad is deterministic, but the IO monad is not).

What makes deterministic code good for advanced testing?

Deterministic code is isolated. Any unexpected outputs are a product of the code, and the inputs you've provided; nothing else. The outside world does not seep through. There are no timing issues, no errant threads or processes mutating shared data, no random numbers outside of your control. Deterministic code is its own computational black box. This narrows down the problem space dramatically.

It follows there is no reason to test the same inputs more than once. This lets us explore the state space more very effectively by having the machine generate the test inputs. I call techniques where the user describes the inputs to the test (as opposed to the user providing specific examples to the test) "Advanced Testing".

Advanced Testing Techniques

The two most prominent techniques are Property-Based Testing, and Deterministic Simulation Testing.

Property Based Testing

Property based testing comes from the functional programming world, and involves tests that are described using equations that must be true, or properties.

Consider this trivial example of a Typescript function:

// statically typed, so it must be correct
const add = (a: number, b: number) => a * b

With an example based (ie, "normal") test, one checks that it works by providing a specific example known by the test author to be correct:

// test passes, ship it
test("0 is the identity element for addition", () => {
  expect(add(0, 0).toBe(0)
})

Property testing takes this one step further. Instead of providing a specific example of a value to test for, one can provide a property or predicate, and have the computer test this with various random numbers:

// fast-check syntax
test("0 is the identity element for addition", () => {
  fc.assert(
    fc.property(fc.number(), a => {
      expect(add(a, 0)).toBe(a)
    })
  )
})

This will produce a large variety of failures that our manual test completely failed to find. Note that these tests are seeded, i.e you can generate the same random numbers as many times as you want to by providing the same seed. As long as the code under test is deterministic, you always get the same results for the same seed.

Deterministic Simulation Testing

Deterministic simulation comes from the systems programming world, and has a different emphasis. Instead of testing individual functions, you instead aim to test a whole deterministic library and simulate its inputs. Disk IO, Network IO, time, etc are all simulated such that they occasionally produce buggy results (disk corruption, requests happening out of order etc).

The simulations themselves are seeded, so if you discover a bug with them they can be replayed to see if you've fixed it.

A fully working example is beyond the scope of this article, but a general comparison follows:

Property Based Testing	Deterministic Simulation Testing
External assertions	Internal assertions
Run using test runners	Separate executable
Seeks a state where property doesn't hold	Tries to crash the library
Tends to be smaller scale	Tends to be larger scale
Short running	Long running

Does my codebase need to be re-written to take advantage of these advanced testing techniques?

No.

In general, any deterministic section of your code, once identified, can be tested with either technique. From experience there's also hybrids between the two approaches you can take.

All code bases have deterministic and non-deterministic sections. In legacy code bases, deterministic and non-deterministic code tend to be more intermingled. "Islands" of determinism are small and scattered.

In better designed code bases, the boundaries are more defined. If you do what FoundationDB and Tigerbeetle did, your program is one big deterministic core, with everything non-deterministic as a peripheral (these products were designed with Deterministic Simulation Testing in mind).

So any code base can take advantage of these techniques, but to differing degrees. In general you will want larger and larger "clumps" of determinism so you can increase the surface area of what you can extensively test. It's worth nothing that classic maintenance programming texts like "Working Effectively with Legacy Code" effectively say as much.

In any case, some advanced testing is better than no advanced testing.

What about testing non-deterministic code?

Your typical end to end test is drowning in non-determinism. Imagine a test that starts at the web browser, and ends up with a database query over a network, and returns the results to the browser. Network syscalls, file IO syscalls, operating system page caches... non-determinism abounds. It may work 10 times in a row, or 100 times, but there's no guarantee it will work on the 101st.

Does this mean end to end tests are useless? Absolutely not!

End to end tests are sanity checks. They're automated versions of manual user checking - performing these steps, as a user, does everything behave as it ought to? They are by definition less likely to catch rare bugs. But they absolutely have their role to play - every useful program must inevitably talk to the outside world.

Don't let perfect be the enemy of good! Remember even the most advanced tests cannot check everything.

End

I hope I've convinced you of the importance and power of advanced testing techniques. Boundaries between deterministic and non-deterministic code are one the most important things to know if you really want to test rigorously.