A quick tour of LLVM's Sanitizer coverage implementation

 

Inside the grey box

After reading about hy­po­thesis’s new cov­erage fea­tures, I’ve re­cently be­come in­ter­ested in how guided fuzzing (as im­ple­mented by Amer­ican Fuzzy Lop or LLVM’s lib­Fuzzer works in­tern­ally with Rust and LLVM. The first step is to un­der­stand how cov­erage works.

Clang’s San­it­izer Cov­erage doc­u­ment­a­tion ex­plains the func­tion­ality very well, so I’ll not re­peat too much of that.

First of all, I started off by looking at the Rust Fuzz pro­ject’s set of tar­gets. The run­-­fuzzer.sh driver script tells cargo to pass sev­eral extra flags to the com­piler. The flag -C passes=sancov in­structs the com­piler to also run the sancov com­piler pass, which an­not­ates the gen­er­ated code to add calls into the cov­erage runtime, and -C llvm-args=-san­it­izer­-­cov­er­age-­level=3 in­structs LLVM to re­cord edge cov­erage so that we can tell what paths of code ex­ecuted (e.g.: dif­fer­en­ti­ating between branches of an if/else ex­pres­sion). The ad­di­tional -Z san­it­izer­=ad­dress also tells the com­piler to link in the san­it­izer sup­port runtime, which in­cludes the routines to re­cord and save cov­er­age.

We’ll start with a trivial pro­gram in main.rs:

#[in­line(never)]
fn show(a: &str) {
    println!("{}", a);
}

fn main() {
    use std::en­v::args;
    for a in args() {
        show(&a)
    }
}

If we com­pile this with RUST­FLAGS=' -C passes=sancov -C llvm-args=-san­it­izer­-­cov­er­age-­level=3 -Z san­it­izer­=ad­dress' cargo run and then look at the res­ulting dis­as­sembled code, using ob­j­dump -CS tar­get/de­bug/­covtest 1, then we see an ad­di­tional set of lines like:

10465:       48 8d 05 24 86 34 00    lea    0x348624(%rip),%rax
1046c:       48 05 d4 03 00 00       add    $0x3d4,%rax
10472:       48 89 c7                mov    %rax,%rdi
10475:       e8 56 68 0e 00          callq  f6cd0 <__san­it­izer­_­cov>

Gran­ted, I’m not great at reading as­sembly, but this looks to lookup the cur­rent pro­gram counter2, mas­sages it a little to create a guard ad­dress, and passes that as the first ar­gu­ment to the __san­it­izer­_cov func­tion.

This looks up the caller’s cur­rent pro­gram coun­ter, then passes that into Cov­er­ageData::Add, which checks uses the guard to check if that point has already been re­cor­ded. If not, it’ll re­cord the pro­gram counter for later stor­age.

This all gets setup by the global con­structors, the same mech­anism uses to call con­structors for static ob­jects in C++. This syn­thes­ises a func­tion named san­c­ov­.­mod­ule_ctor that then calls __san­it­izer­_­cov­_­mod­ule_init; which al­loc­ates space and sets up the cov­erage data struc­tures. The san­it­izer runtime will also en­sure that if needed, __san­it­izer­_­cov­_­dump is called when the pro­cess exits; so that the cov­erage in­form­a­tion will get saved to disk, and later ana­lysed.

So code cov­erage is one of those things that can seem some­what ma­gical; mostly be­cause modern com­pilers can seem aw­fully com­plex (and in fair­ness, they do an awful lot); but the nuts and bolts of it aren’t that com­plic­ated in them­selves.

LLVM does have the very cool fea­ture that it’s pos­sible to provide your own im­ple­ment­a­tion of the cov­erage in­ter­face, al­lowing you to do cus­tom­ized, very de­tailed tra­cing of your pro­gram, if you want to do fan­cier things like ana­lyzing the exact con­trol flow of your pro­gram. But that’s an ex­er­cise for an­other day.


  1. this as­sumes the GNU BinUtils suite; com­monly used on Linux. Other sys­tems will likely have sim­ilar tools.

  2. i.e.: the in­struc­tion that was run­ning at the time