Benchmarking Deep Learning Fuzzers
- URL: http://arxiv.org/abs/2310.06912v1
- Date: Tue, 10 Oct 2023 18:09:16 GMT
- Title: Benchmarking Deep Learning Fuzzers
- Authors: Nima Shiri Harzevili, Hung Viet Pham, Song Wang
- Abstract summary: We run three state-of-the-art DL fuzzers, FreeFuzz, DeepRel, and DocTer, on the benchmark by following their instructions.
We find that these fuzzers are unable to detect many real bugs collected in our benchmark dataset.
Our systematic analysis further identifies four major, broad, and common factors that affect these fuzzers' ability to detect real bugs.
- Score: 11.118370064698869
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we set out to conduct the first ground-truth empirical
evaluation of state-of-the-art DL fuzzers. Specifically, we first manually
created an extensive DL bug benchmark dataset, which includes 627 real-world DL
bugs from TensorFlow and PyTorch libraries reported by users between 2020 and
2022. Then we run three state-of-the-art DL fuzzers, i.e., FreeFuzz, DeepRel,
and DocTer, on the benchmark by following their instructions. We find that
these fuzzers are unable to detect many real bugs collected in our benchmark
dataset. Specifically, most (235) of the 257 applicable bugs cannot be detected
by any fuzzer.
Our systematic analysis further identifies four major, broad, and common
factors that affect these fuzzers' ability to detect real bugs. These findings
present opportunities to improve the performance of the fuzzers in future work.
As a proof of concept, we propose a lightweight corner case generator as an
extension to the three DL fuzzers, which simply covers several boundary values
as well as DL-specific data types. It helps FreeFuzz, DeepRel, and DocTer
detect 12, 12, and 14 more bugs, respectively, that were overlooked by the
original fuzzers. Overall, this work complements prior studies on DL fuzzers
with an extensive performance evaluation and provides a benchmark for future DL
library fuzzing studies. Also, our proposed corner case generator proves that
the fuzzers can be extended to detect more bugs by extending their internal
fuzzing logic based on the insights provided in root cause analysis.
Related papers
- Pipe-Cleaner: Flexible Fuzzing Using Security Policies [0.07499722271664144]
Pipe-Cleaner is a system for detecting and analyzing C code vulnerabilities.
It is based on flexible developer-designed security policies enforced by a tag-based runtime reference monitor.
We demonstrate the potential of this approach on several heap-related security vulnerabilities.
arXiv Detail & Related papers (2024-10-31T23:35:22Z) - G-Fuzz: A Directed Fuzzing Framework for gVisor [48.85077340822625]
G-Fuzz is a directed fuzzing framework for gVisor.
G-Fuzz has been deployed in industry and has detected multiple serious vulnerabilities.
arXiv Detail & Related papers (2024-09-20T01:00:22Z) - FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks.
FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - FuzzSlice: Pruning False Positives in Static Analysis Warnings Through
Function-Level Fuzzing [5.748423489074936]
We propose FuzzSlice, a framework that automatically prunes possible false positives among static analysis warnings.
The key insight that we base our work on is that a warning that does not yield a crash when fuzzed at the function level in a given time budget is a possible false positive.
FuzzSlice reduces false positives by 62.26% in the open-source repositories and by 100% in the Juliet dataset.
arXiv Detail & Related papers (2024-02-02T21:49:24Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - Prompt Fuzzing for Fuzz Driver Generation [6.238058387665971]
We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing.
It iteratively generates fuzz drivers to explore undiscovered library code.
PromptFuzz achieved 1.61 and 1.63 times higher branch coverage than OSS-Fuzz and Hopper, respectively.
arXiv Detail & Related papers (2023-12-29T16:43:51Z) - HOPPER: Interpretative Fuzzing for Libraries [6.36596812288503]
HOPPER can fuzz libraries without requiring any domain knowledge.
It transforms the problem of library fuzzing into the problem of interpreter fuzzing.
arXiv Detail & Related papers (2023-09-07T06:11:18Z) - What Happens When We Fuzz? Investigating OSS-Fuzz Bug History [0.9772968596463595]
We analyzed 44,102 reported issues made public by OSS-Fuzz prior to March 12, 2022.
We identified the bug-contributing commits to estimate when the bug containing code was introduced, and measure the timeline from introduction to detection to fix.
arXiv Detail & Related papers (2023-05-19T05:15:36Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - DeFuzz: Deep Learning Guided Directed Fuzzing [41.61500799890691]
We propose a deep learning (DL) guided directed fuzzing for software vulnerability detection, named DeFuzz.
DeFuzz includes two main schemes: (1) we employ a pre-trained DL prediction model to identify the potentially vulnerable functions and the locations (i.e., vulnerable addresses)
Precisely, we employ Bidirectional-LSTM (BiLSTM) to identify attention words, and the vulnerabilities are associated with these attention words in functions.
arXiv Detail & Related papers (2020-10-23T03:44:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.