[Read Paper USENIX 2023] DAFL: Directed Grey-box Fuzzing Guided by Data Dependency

Background#

The method of Fuzzing is a way to find errors in the tested program by generating a large amount of input.

One of the key points of Fuzzing is the input generation method, the second is the input sorting method, and the third is the acquisition and application of internal software information.

If the input is completely self-constructed, this method is called generational fuzzing.

If the input is obtained by slightly modifying the original seed, this type of fuzzing is called Mutational fuzzing.

If the Mutator can be guided by certain program information, it is called GrayBox Fuzzing, such as coverage-guided fuzz testing.

If the Seed is allocated energy through Power Schedule, and then the Mutator is sorted based on the energy of the Seed, this method is called Boosted GrayBox Fuzzing.

What is grey-box fuzzing#

White-box largely utilizes source code for constraint solving, program analysis, and other operations.

Black-box does not obtain any information about the source code.

Gray-box, on the other hand, obtains partial information about the program, commonly by obtaining program execution coverage. The most common method is binary instrumentation (inserting custom recording code after each branch at compile time). For example, AFL is a gray-box fuzzer.

What is DGF (directed grey-box fuzzing)#

Why is there DGF#

Most gray-box fuzz testing is guided by the coverage of seeds to cover as many paths as possible within a limited time. The main reason for doing this is that code coverage and bug coverage are positively correlated. When a fuzz testing tool can cover more code, it often finds more bugs.

However, not all code in the program should be treated equally or with equal weight, because most of the code covered by the fuzz testing tool does not contain bugs. Moreover, in practical applications, it is very difficult or even impossible to achieve complete coverage of the code distribution.

To overcome this problem, directed fuzz testing is proposed.

How is it different from traditional gray-box fuzzing#

Compared with traditional gray-box fuzz testing:

Different goals: Traditional gray-box fuzz testing aims to improve code coverage, while DGF tends to test bugs in specific code blocks.
Different techniques: DGF generally requires data flow and control flow graphs to calculate the distance between seeds and target code segments.

The problem DGF ultimately solves#

Use the distance between seeds and target targets as a fitness function to help select seeds.

Seeds that are closer to the target location have more mutation opportunities to guide gray-box fuzz testing to reach the target location.

The entire directed gray-box fuzz testing converts the previous reachability problem into an optimization problem.

That is, optimize the distance between the generated seeds and the target location.

After understanding the entire process of directed gray-box fuzz testing, it can be found that the key to the whole problem is how to define the distance between seeds and target locations for optimization.

前置知识 (Pre-knowledge)#

The method of Fuzzing is a way to find errors in the tested program by generating a large amount of input.

One of the key points of Fuzzing is the input generation method, the second is the input sorting method, and the third is the acquisition and application of internal software information.

If the input is completely self-constructed, this method is called generational fuzzing.

If the input is obtained by slightly modifying the original seed, this type of fuzzing is called Mutational fuzzing.

If the Mutator can be guided by certain program information, it is called GrayBox Fuzzing, such as coverage-guided fuzz testing.

If the Seed is allocated energy through Power Schedule, and then the Mutator is sorted based on the energy of the Seed, this method is called Boosted GrayBox Fuzzing.

What is grey-box fuzzing#

White-box largely utilizes source code for constraint solving, program analysis, and other operations.

Black-box does not obtain any information about the source code.

What is DGF (directed grey-box fuzzing)#

Why is there DGF#

To overcome this problem, directed fuzz testing is proposed.

How is it different from traditional gray-box fuzzing#

Compared with traditional gray-box fuzz testing:

Different goals: Traditional gray-box fuzz testing aims to improve code coverage, while DGF tends to test bugs in specific code blocks.
Different techniques: DGF generally requires data flow and control flow graphs to calculate the distance between seeds and target code segments.

The problem DGF ultimately solves#

Use the distance between seeds and target targets as a fitness function to help select seeds.

Seeds that are closer to the target location have more mutation opportunities to guide gray-box fuzz testing to reach the target location.

The entire directed gray-box fuzz testing converts the previous reachability problem into an optimization problem.

That is, optimize the distance between the generated seeds and the target location.

Background#

DGF is currently mainly based on two key mechanisms:

DGF can support the evolution of test cases guided by coverage guidance in undirected gray-box fuzzing.
DGF measures the priority of each test case by the distance between execution nodes and target nodes in the CFG (control flow graph), and sorts them to provide guidance for the fuzzer to reach the target program point.
There are two scalability issues with the current DGF:
Traditional coverage feedback does not always provide meaningful guidance to find target program points (segments that may cause crashes).
When the program has complex control structures, the existing seed distance algorithms perform poorly.

Regarding coverage guidance, sometimes the direction of coverage guidance deviates from the target program point due to the large size of the program. Beacon was the first to attempt to solve this problem and used a weakest precondition to determine whether the target program point would be reached. However, this method is not effective when the program has complex loops.

When the program is large, the length of the execution node chain is long and contains many nodes that are semantically unrelated to the target program point. WindRanger is the first to use Deviation Basic Blocks (DBBs) to solve the deviation problem, but it is not a good form in loop structures.

Motivation#

Simply put, it is an array out-of-bounds error in C language. The user can input an arbitrarily sized integer as an index, and a vulnerable function does not validate the input index.

There is only one path to reach this vulnerable function, but from the main function, there are many branching paths along the way, which greatly disperses the energy allocation of test cases and leads to misleading results.

By comparing the seed distance calculation of AFLGo and WindRanger, it is found that the results are the same. That is, when dealing with complex loop structures, the distance measurement based on DBBs will also mislead.

Solution#

Selective coverage instrumentation#

In order to reach the specified code point t in program P, a Def-Use Graph (DUG) is constructed backward at t to collect all dependent statements and use thin slicing.

About Thin Slicing: Dereference pointers (remove redundant variables caused by pointers) to reduce the size of slices (understood as a backward data flow graph of data dependencies), thereby providing more efficient guidance for directed fuzzing.

Why use DUG? Because DUG can automatically ignore many irrelevant nodes, leaving only nodes with data dependencies. Moreover, even if the corresponding CFG has loops, DUG does not have loops (as long as there is no loop data dependency).

Semantic relevance scoring#

The distance from the target node to the farthest node on the DUG is recorded as 1, and each time a node is closer to the target node, the shortest distance from all subsequent nodes to the target node is increased by 1. In this way, the sum of the scores of the execution node paths is the semantic relevance score.

To achieve the effect: If a seed executes more nodes or nodes closer to the target program point, the comprehensive score is higher (as shown in the 3c graph).

Exp#

Five fuzzers, AFL, AFLGo, Beacon, WindRanger, and DAFL, were used to fuzz 41 vulnerabilities in the Beacon paper, and the metric was TTE (Time-to-Exposure), which is the time it takes to fuzz the first crash in 24 hours.

But I don't know why Beacon is not in the graph.

The result is that DFL is at least 1.93 times faster on average than the other fuzzers.

Subsequently, ablation experiments were conducted:

Thin slicing can make it 2.01 times faster on average.

Selective coverage instrumentation can make it 1.73 times faster on average (compared to AFL).

Semantic relevance scoring can make it 1.39 times faster on average (compared to AFL).

The combination of these two techniques can achieve 1.93 times faster on average (compared to AFL).

Discussion#

Discusses the support of DAFL for multiple targets, that is, when there are multiple target program points on the execution node chain, the average distance to them is taken as the relevance score.

Also discussed the issues of supporting other languages such as C/C++, Go, etc.

Summary#

Due to the misleading problem of the existing DGF distance calculation, DUG is used instead of CFG, and data flow information is used instead of control flow information, so that irrelevant nodes do not exist, thereby not misleading the fuzzer's direction.

Using thin slicing for pruning, DUG becomes more concise, improving the efficiency of the fuzzer.

Reference Links#

Fuzzing 技术总结 (Brief Surveys on Fuzz Testing)
https://zhuanlan.zhihu.com/p/43432370
Directed Greybox Fuzzing 相关内容整理
https://zhuanlan.zhihu.com/p/415316061
Hawkeye: 定向灰盒模糊测试技术
https://zhuanlan.zhihu.com/p/130767857
给你的模糊测试开开窍 —— 定向灰盒模糊测试（Directed Greybox Fuzzing）综述
https://www.cnblogs.com/welkinchan/p/17685797.html
【学习笔记】Fuzzing 学习笔记 3—— 灰盒 Fuzzing
https://superlova.github.io/2020/08/20/%E3%80%90%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0%E3%80%91Fuzzing%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B03%E2%80%94%E2%80%94%E7%81%B0%E7%9B%92Fuzzing