After reading about hypothesis’s new coverage features, I’ve recently become interested in how guided fuzzing (as implemented by American Fuzzy Lop or LLVM’s libFuzzer works internally with Rust and LLVM. The first step is to understand how coverage works.
Clang’s Sanitizer Coverage documentation explains the functionality very well, so I’ll not repeat too much of that.
First of all, I started off by looking at the Rust Fuzz project’s set of targets. The run-fuzzer.sh driver script tells cargo to pass several extra flags to the compiler. The flag
-C passes=sancov instructs the compiler to also run the
sancov compiler pass, which annotates the generated code to add calls into the coverage runtime, and
-C llvm-args=-sanitizer-coverage-level=3 instructs LLVM to record edge coverage so that we can tell what paths of code executed (e.g.: differentiating between branches of an if/else expression). The additional
-Z sanitizer=address also tells the compiler to link in the sanitizer support runtime, which includes the routines to record and save coverage.
We’ll start with a trivial program in
If we compile this with
RUSTFLAGS=' -C passes=sancov -C llvm-args=-sanitizer-coverage-level=3 -Z sanitizer=address' cargo run and then look at the resulting disassembled code, using
objdump -CS target/debug/covtest 1, then we see an additional set of lines like:
10465: 48 8d 05 24 86 34 00 lea 0x348624(%rip),%rax 1046c: 48 05 d4 03 00 00 add $0x3d4,%rax 10472: 48 89 c7 mov %rax,%rdi 10475: e8 56 68 0e 00 callq f6cd0 <__sanitizer_cov>
Granted, I’m not great at reading assembly, but this looks to lookup the current program counter2, massages it a little to create a guard address, and passes that as the first argument to the
This looks up the caller’s current program counter, then passes that into
CoverageData::Add, which checks uses the guard to check if that point has already been recorded. If not, it’ll record the program counter for later storage.
This all gets setup by the global constructors, the same mechanism uses to call constructors for static objects in C++. This synthesises a function named
sancov.module_ctor that then calls
__sanitizer_cov_module_init; which allocates space and sets up the coverage data structures. The sanitizer runtime will also ensure that if needed,
__sanitizer_cov_dump is called when the process exits; so that the coverage information will get saved to disk, and later analysed.
So code coverage is one of those things that can seem somewhat magical; mostly because modern compilers can seem awfully complex (and in fairness, they do an awful lot); but the nuts and bolts of it aren’t that complicated in themselves.
LLVM does have the very cool feature that it’s possible to provide your own implementation of the coverage interface, allowing you to do customized, very detailed tracing of your program, if you want to do fancier things like analyzing the exact control flow of your program. But that’s an exercise for another day.
This assumes the GNU BinUtils suite; commonly used on Linux. Other systems will likely have similar tools.↩
I.E.: The instruction that was running at the time↩