Die-hard Statefully


Puzzling quickcheck

After reading Solving the Water Jug Problem from Die Hard 3 with TLA+ and Hy­po­thesis, I figured it’d be amusing to re­pro­duce it in Rust as diehard-rs, along with it’s quickcheck lib­rary.

The post above de­scribes the problem quite well, so there’s little need for me to re­pro­duce it here.

However, where hy­po­thesis provides a rather nice mech­anism for cre­ating com­mand based prop­erty tests, we have to do a little extra work ourselves, and ef­fect­ively create a little lan­guage, and as­so­ci­ated in­ter­preter. This ap­proach is useful when testing stateful sys­tems (al­beit ones that we can use­fully seperate from the rest of a po­ten­tially noisy sys­tem) and has been used to find er­rors in LevelDB and Riak amongst oth­ers.

The core of the test should be reas­on­ably easy; we create a de­fault state; apply each com­mand in turn to the state, and if we have fin­ished, we re­turn false. False? Hang on.

We re­turn false be­cause quickcheck is nor­mally used to find the smal­lest input that causes an as­ser­tion to fail, wheras we want to find the smal­lest input that meets our fin­ishing cri­teria; so we have to in­vert the logic in­volved; and pre­tend to quickcheck that our test has failed.

quickcheck! {
    fn diehard(xs: Vec<Op>) -> bool {
        // printl­n!("{:?}", xs);
        let mut st = State::de­fault();
        for o in xs {
            if st.fin­ished() { re­turn false; }
        re­turn true

Now, the reason that this kind of prop­erty testing is so quick to solve the problem is down to two points:

We short-cut eval­u­ation of the sample data; so a working sub­sequence within the input will abort input as soon as the problem is solved. It’s more than likely that if we were to say, not no­tice that the problem had been solved …

… so whenever our state ma­chine finds that the big jug con­tains ex­actly 4 liters of wa­ter, we break out of the loop and re­turn false to quickcheck to in­dicate that it’s a match (nor­mally, this would mean a test fail­ure).

This is some­thing of a con­ceit, too, as we very much rely on the fact we can gen­erate long se­quences of random steps, and hope that a valid sub­sequence ex­ists some­where; which vastly in­creases the chances of this suc­ceed­ing.

The second part of the secret sauce here, and what makes quickcheck and friends so ef­fective as testing tools, is auto­matic re­duc­tion of test cases. Quite of­ten, when quickcheck first man­ages to find a failing test case, it’ll be some­where in the middle of a rather ab­surdly large amount of ran­domly gen­er­ated data. And manu­ally dig­ging through a large amount of (usu­ally) mean­ing­less data is rarely fun.

So, quickcheck will then pro­ceed to re­move items from the input se­quence (in the case of our Vec­<Op> input se­quence), and re-apply them to see if the input still pro­vokes the fail­ure. In our case, we can see that any steps in the se­quence after we have ter­minate will be ig­nored, and thus are safe to throw away (al­though quickcheck has to try this the old fash­ioned way; just by re­moving an item and trying the test without it).

Also, we can easily in­tuit that most op­er­a­tions early in the se­quence will have little to im­pact on a later ter­min­ating state, as the range of states in our model is reas­on­ably lim­ited, but also be­cause any sub­sequence con­taining Empty­BigJug and EmptyS­mallJug steps (but con­taining both) will empty both jugs, and thus reset us back to our starting state. So in­tu­it­ively, any­thing be­fore this pair can be safely re­moved too.

There are likely other puzzles that we could apply this to, such as the n-queens puzzle. But, as we’ve men­tioned, the ef­ficacy of this ap­proach re­lies on how we en­code the prob­lem. We want to take ad­vantage of the back­tracking offered to op­timise the search, as the range of pos­sible states for other puzzles can be some­what lar­ger. Hence, you’d likely want to be able to provide more hints to the en­gine as to how to ex­ecute the search.