Fuzzing
Find vulnerabilities using fuzzing
Last updated
Find vulnerabilities using fuzzing
Last updated
Excellent playlist:
Takes a corpus of inputs. Then it picks an input and does a random mutation on it. See if this mutated test covers new code; if so, store it in the corpus. Repeat the process in a loop.
API fuzzing:
When should I use coverage guided fuzzing?
Coverage guided fuzzing is recommended as it is generally the most effective. This works best when:
The target is self-contained.
The target is deterministic.
The target can execute dozens or more times per second (ideally hundreds or more).
What should the coverage look like?
You want to cover as many lines as possible,
You want to cover those lines as uniformly as possible (not only cover some line once)
You don’t need to instrument things you don’t care about (well-tested 3rd party libraries, etc
How do I deal with coverage gaps?
Edit target code to take a branch/generate structured data (least amount of work)
In API testing, you can edit the harness to make code reachable or more likely to be reached. (medium amount of work, but doesn’t affect the tested API)
Add a test case to reach it directly (most amount of work)
Nop out/avoid some large sections of code that are just noise (if you’re fuzzing ffmpeg, only fuzz one format)
I keep hitting DoS bugs
You can just patch around uninteresting bugs until you find an RCE.
Reporting
You can put the main function in a try-catch block, because you often don’t care about the exceptions the application knows to throw. You care about things that create a crash, like segfaults and stuff.
Just detecting crashes isn’t good enough, since not all memory corruptions produce a crash (like if a single byte of memory is corrupted). Instead, you should use something like ASan (Address Sanitizer) to detect memory corruption.
However, it can be a good idea to run LibFuzzer without sanitization first, as that way you’ll find the more critical bugs first. Otherwise, it’ll trip on a smaller issue (off-by-one heap overflow for example) and it won’t get to the bigger issue until you patch it
Note: LeakSanitizer for memory leakage issues?
Also known as variant fuzzing, differential fuzzing, and bounded analysis.
This sort of fuzzing doesn’t look for crashes or infinite loops. Instead, it checks if the output of a function exceeds expected bounds. Basically, it finds logic bugs
For example, let’s say there’s a language analysis function, which rates a Youtube comment based on its offensiveness, on a scale of 0-9. If we do variant analysis on that function, then we would run LibFuzzer for example, and check the function’s output. If the output is less than 0 or more than 9, then we have hit a logic bug.
What is it useful for?
Finding bugs in memory-safe languages (golang, javascript) for certain functions
Fuzzing Ethereum Solidity smart contracts
There have been a few times where a coin operator has created a contract with bugs in it.
You can use this fuzzing method to find logical errors, where you can give yourself too much money.
Benefits over LibFuzzer:
Easier to set up
Can be more reliable or work in more difficult situations
Negatives:
Might catche less bugs
Doesn’t have memory leak detection instrumentation, for example.
It does have address sanitation, though, which is good.
With LibFuzzer you can easily target individual functions because you have to write an LLVMTestOneInput entry function.
Setting Up
Example of setting up fuzzing for duktape javascript engine
For normal compilation, you can just run make -f Makefile.cmdline
Create a directory in
w`ith a corpus of a few good examples (of javascript examples, in this case). I added 2 examples in files 000 and 001:
1 + 2
var num = 2; num * 4
Also make an out
dir.
Then, in the makefile, you’ll need to specify the afl compiler as the compiler of the program, and compile an AFL version of the program. There’s a segment that explains this in the AFL github page.
And all we have to change in the Makefile.cmdline is to change:
To:
Then, make the file:
And run the fuzzer (./duk
is the executable):
Use AFL++, it's maintained. Original isn't.
Useful documentation is in AFL++ documentation:
docs/life_pro_tips.txt
docs/README.md
llvm_mode/README.md
Quickstart
You have an inputs corpus in a folder (and make sure to add a valid input).
Compile the program with AFL:
AFL_HARDEN adds code hardening options
Then fuzz it:
Test Harnesses
Harness requirements:
Runs in the foreground
Processes input from stdin or from a specified file
Exits cleanly
Skips checksums and cryptographic integrity tests (or use a postprocessor), because the fuzzer will never get through those
Recommended:
If you're going beyond simple program crashes, then the test harness should crash if the program is in an invalid state
Use AFL's deferred forkserver + persistent mode for speed
Simple Harness
This harness only tests one of the two functions
Important points:
Define the size of the input buffer to limit the fuzzer
Initialize the input buffer with zeroes to make results reproducible!
Compile and run like this:
Integer Inputs
This harness uses two 8-byte integer inputs if you specify the "mul" argument:
Run it like this:
Reading from files
The "@@" symbol specifies that the input is a filename.
AFL comes with the afl-gcc and afl-clang compilers by default. Where possible, you should prefer to use the afl-clang-fast compiler, because then you can use LLVM mode. However note that trying to compile a program with clang instead of gcc can sometimes fail.
LLVM mode allows you to significantly speed up your execution for most targets because of multiple reasons:
Tell AFL where to start each new process
Is especially useful for processes with an expensive setup phase that isn’t dependent on input
Don’t fork for each run, just loop around this bit of code
Requires that the program’s state can be completely reset so that multiple calls can be performed without resource leaks
Note: This documentation page says that afl-clang-lto is the best compiler to use because the instrumentation doesn't have collisions, and therefore gets better coverage. However, it's a work in progress and you might run into issues during compilation.
If there's a memory corruption that starts making it so that the program's state isn't completely reset and you're not using ASan to detect it, then you can use first persistent mode to fuzz to a certain extent to get a good corpus, and then fuzz in normal mode for consistency.
Running in persistent mode might cause stability to lower, this is normal. As long as stability is over 90 percent, everything is fine. The user manual goes into a bit more depth on this.
Seed Corpus
Tips:
Find real inputs that exercise as much of the target as possible.
Keep your seed files small (under 1kB is ideal).
Don’t put in redundant files.
Use afl-tmin and afl-cmin to minimize your seed corpus. Also when your fuzzing job has been running for a while (after the first cycle of the master fuzzer for example), you can also take your test corpus, minimize that and fuzz some more. You can use afl-ptmin to parallellize this process.
Sanitizers
Sanitizers turn tiny issues into crashes which AFL can detect.
Sanitizers are mutually incompatible, so run multiple fuzzing instances with different sanitizers
UBSan
MSan
ASan
TSan
Use AFL_HARDEN=1 to harden your program (fortify_source and stack protection) and therefore catch more issues
ASan
ASan can require a lot of memory – so much that using it on a 64-bit system can be infeasible if the program isn't robust and doesn't enforce reasonable memory limits.
Read the documentation for workarounds to this. Here are some options:
Compile as 32-bit
Gauge memory limits using recidivm
Use cgroups to limit available memory
Note: I haven't been able to get the cgroups script to work in the afl++ docker container.
AFL runs on one core, so it makes sense to run one per core. However it’s not trivial.
Multicore:
One main instance. -M
One secondary instance per core. -S
All share the same output directory. -o
Multi Machine:
Lots of secondary instances
You need to sync state directories periodically
Tip: Run fuzzers with differing behaviours (sanitizers; fuzzing algorithms)
For large-scale serious fuzzing, it makes sense to use a fuzzing service that allows you to scale up massively. That way you don’t have to hack together multi-machine fuzzing yourself.
ClusterFuzz
OneFuzz
Very useful for helping the fuzzer get through difficult parts where certain values are expected. Specifies interesting tokens the fuzzer can use to access paths it otherwise might not stumble upon.
Sources:
“dictionaries” folder
Automatically generated with afl-clang-lto (llvm11+)
Statically finds string comparisons while compiling
This feature is called autodictionary
Automatically generated with libtokencap
At runtime, looks at calls to strcmp and memcmp and remembers tokens
Source code review
Other similar useful features:
Works by splitting larger comparisons down to smaller ones so the fuzzer can traverse paths and get further.
Use export AFL_LLVM_LAF_ALL=1
to enable all Iaf-intel passes
Beyond memory corruption
You can add checks to detect if bad things have happened, e.g:
Send input into two different math/crypto libraries and asserts that their output is identical
assert that result of BN_sqr(x) == BN_mul(x,x) - CVE-2014-3570 in OpenSSL
assert if any filesystem changes have occurred – CVE-2014-6271 in Bash (shellshock)
If something bad has happened, then just crash the program manually using assert.
Network targets:
Write a harness around the target function
Use inetd mode if your target supports it
Intercept network system calls, e.g using preenv
The queue directory contains a corpus of test cases that exercise your program
You can run it through CoverageSanitizer or gcov and see what isn’t hit
Resume a fuzzing run with -i-
You can recompile the binary and then do this
ASan traces can be helpful
Use afl-tmin to minimise and simplify test cases while retaining the original control flow
“Unique” bugs aren’t unique. They might be the same bug, just through a different control flow. Fix them as you go along and carry on fuzzing.
Memory limits can give false positives
Use crashwalk to get a head start on triaging crashes and finding which ones are useful for exploitation. However, note that if you're using ASan, then this probably isn't always helpful, since ASan just does SigAbort when it finds something, and crashwalk doesn't know what to do with it. If you turn off ASan, then the program might not crash at all.
Options:
Never - Fuzz as part of CI
When the cycles counter is green
Last new path was found many cycles ago
Pending paths is zero
If you want to stop earlier:
Cycles counter is blue
Been running for a while
Look at the output of afl-plot
Remember to check code coverage.
See which parts of the program are hit, and whether all the important functions are hit.
You can check for code coverage using afl-cov.
LibXML (Optimization)
We're compiling using afl-clang-fast here, but note that lto is usually better. We're using ASAN as the sanitizer. We're fuzzing the xmlReadMemory function.
Instrumenting the library with ASAN:
Compiling the harness:
Seed corpus:
Fuzzing with an XML dictionary:
Basic harness, no optimizations, 250 execs/sec, BAD:
Important: Note how the buffer is filled with zeros in main. This is good, because otherwise it would contain random data and the fuzzing might not be reproducible if there is a bug (stability will suffer).
Let's start adding persistent mode optimizations.
Deferred initialization gives a negligible speedup (from 250 to 270 executions per second). Note that AFL does deferred initialization to the main function by default, but you can defer it even more (after xmlInitParser() in this case):
Adding persistent mode via AFL_LOOP gives ~10x speedup to ~2500 execs/sec:
When we added shared memory fuzzing, we further increased the speed by ~50%. In this case, we got rid of the memory limitation, but you can easily just program one in by returning when the "len" variable is higher than what you want.
We further increased speed using parallellization. We ran 4 fuzzers in parallel and the fuzz speed increased by 4x, since each fuzzer runs on 1 thread.
When doing parallel fuzzing, you can check up on your progress using the afl-whatsup tool:
After 44 minutes of total running time (11 minutes per core), we have 48 locally unique crashes
It’s said to be recommended to use LibFuzzer over AFL and honggfuzz (or use multiple in conjunction), as it will find the most bugs. But at the same time, it’s a bit more difficult to set up. Unlike AFL, LibFuzzer doesn’t do forking + feeding data to stdin.
It fuzzes in-process persistently by default (though AFL added support for this)
State can accumulate (sometimes good, sometimes bad)
You have to be careful about state accumulation.
Similar fuzzers:
Honggfuzz
The above tutorial contains instructions about:
Setting up libfuzzer
Running it on multiple cores
Minimizing the corpus
Finding RCEs which are blocked by DoS vulns
Coverage visualization
And more
This demonstrates how to run LibFuzzer on a very simple fuzzme, which already includes the LLVMTestOneInput function.
Compile it with ASan and fuzzer options:
Then run it, and it will find a heap buffer overflow.
Similar to LibFuzzer. May find some bugs that libfuzzer won’t and may miss a lot of bugs that LibFuzzer won’t. It’s a bit of a mix of libfuzzer and AFL in its usage.
First you have to compile it with instrumentation:
Then, you have to run it with honggfuzz:
-P
is for persistent mode
--output
saves the generated corpus in the specified directory
-i
specifies where the input corpus should be taken from
A blackbox fuzzer generates inputs for a target program without knowledge of its internal behaviour or implementation.
A blackbox fuzzer may generate inputs from scratch, or rely on a static corpus of valid input files to base mutations on. Unlike coverage guided fuzzing, the corpus does not grow here.
Also, if you don’t have coverage guiding or sanitization (bug detection), then it will probably miss a lot of bugs.
Pretty easy and beginner-friendly tool. Radamsa accepts input and tries to generate interesting test cases from that.
You’ll probably want to write a script to continuously use radamsa on a program and detect segfaults and such. Here’s an example of using it on Exploit Exercises Phoenix Format 5:
This section covers difficulties in fuzzing programs, which do not include the source code, and ways to overcome that issue.
Approaches:
Black box fuzzing - doesn’t find a lot of bugs
Adding instrumentation (coverage tracking, etc) by dynamically rewriting - simple to implement, but 10-100x slower
AFL can do this by running executables under QEMU, and QEMU provides tracing and everything for you. You can even use it to fuzz ARM binaries from an x86 system.
Adding instrumentation by statically rewriting - more complex analysis, but much more performant. Needs to fully model program control flow without executing each path, so can be buggy (for example with stripped binaries).
Symbolic execution executes a program, but uses symbolic values (formulae) instead of normal variable names. When encountering a path, it takes both paths (and adjusts the possible values accordingly for each path). Thus it can theoretically traverse every single branch of a program and find everything. Practically, it has a number of practical limitations, however:
Large computing power demand
State explosion
Difficult to set up
https://nebelwelt.net/files/13CCC-presentation.pdf