A quick tour of LLVM's Sanitizer coverage implementation

Inside the grey box

After reading about hypothesis’s new coverage features, I’ve recently become interested in how guided fuzzing (as implemented by American Fuzzy Lop or LLVM’s libFuzzer works internally with Rust and LLVM. The first step is to understand how coverage works.

Clang’s Sanitizer Coverage documentation explains the functionality very well, so I’ll not repeat too much of that.

First of all, I started off by looking at the Rust Fuzz project’s set of targets. The run-fuzzer.sh driver script tells cargo to pass several extra flags to the compiler. The flag -C passes=sancov instructs the compiler to also run the sancov compiler pass, which annotates the generated code to add calls into the coverage runtime, and -C llvm-args=-sanitizer-coverage-level=3 instructs LLVM to record edge coverage so that we can tell what paths of code executed (e.g.: differentiating between branches of an if/else expression). The additional -Z sanitizer=address also tells the compiler to link in the sanitizer support runtime, which includes the routines to record and save coverage.

We’ll start with a trivial program in main.rs:

If we compile this with RUSTFLAGS=' -C passes=sancov -C llvm-args=-sanitizer-coverage-level=3 -Z sanitizer=address' cargo run and then look at the resulting disassembled code, using objdump -CS target/debug/covtest 1, then we see an additional set of lines like:

10465:       48 8d 05 24 86 34 00    lea    0x348624(%rip),%rax
1046c:       48 05 d4 03 00 00       add    $0x3d4,%rax
10472:       48 89 c7                mov    %rax,%rdi
10475:       e8 56 68 0e 00          callq  f6cd0 <__sanitizer_cov>

Granted, I’m not great at reading assembly, but this looks to lookup the current program counter2, massages it a little to create a guard address, and passes that as the first argument to the __sanitizer_cov function.

This looks up the caller’s current program counter, then passes that into CoverageData::Add, which checks uses the guard to check if that point has already been recorded. If not, it’ll record the program counter for later storage.

This all gets setup by the global constructors, the same mechanism uses to call constructors for static objects in C++. This synthesises a function named sancov.module_ctor that then calls __sanitizer_cov_module_init; which allocates space and sets up the coverage data structures. The sanitizer runtime will also ensure that if needed, __sanitizer_cov_dump is called when the process exits; so that the coverage information will get saved to disk, and later analysed.

So code coverage is one of those things that can seem somewhat magical; mostly because modern compilers can seem awfully complex (and in fairness, they do an awful lot); but the nuts and bolts of it aren’t that complicated in themselves.

LLVM does have the very cool feature that it’s possible to provide your own implementation of the coverage interface, allowing you to do customized, very detailed tracing of your program, if you want to do fancier things like analyzing the exact control flow of your program. But that’s an exercise for another day.

  1. This assumes the GNU BinUtils suite; commonly used on Linux. Other systems will likely have similar tools.

  2. I.E.: The instruction that was running at the time