Related papers: Benchmarking Deep Learning Fuzzers

Benchmarking Deep Learning Fuzzers

URL: http://arxiv.org/abs/2310.06912v1
Date: Tue, 10 Oct 2023 18:09:16 GMT
Title: Benchmarking Deep Learning Fuzzers
Authors: Nima Shiri Harzevili, Hung Viet Pham, Song Wang
Abstract summary: We run three state-of-the-art DL fuzzers, FreeFuzz, DeepRel, and DocTer, on the benchmark by following their instructions. We find that these fuzzers are unable to detect many real bugs collected in our benchmark dataset. Our systematic analysis further identifies four major, broad, and common factors that affect these fuzzers' ability to detect real bugs.
Score: 11.118370064698869
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we set out to conduct the first ground-truth empirical evaluation of state-of-the-art DL fuzzers. Specifically, we first manually created an extensive DL bug benchmark dataset, which includes 627 real-world DL bugs from TensorFlow and PyTorch libraries reported by users between 2020 and 2022. Then we run three state-of-the-art DL fuzzers, i.e., FreeFuzz, DeepRel, and DocTer, on the benchmark by following their instructions. We find that these fuzzers are unable to detect many real bugs collected in our benchmark dataset. Specifically, most (235) of the 257 applicable bugs cannot be detected by any fuzzer. Our systematic analysis further identifies four major, broad, and common factors that affect these fuzzers' ability to detect real bugs. These findings present opportunities to improve the performance of the fuzzers in future work. As a proof of concept, we propose a lightweight corner case generator as an extension to the three DL fuzzers, which simply covers several boundary values as well as DL-specific data types. It helps FreeFuzz, DeepRel, and DocTer detect 12, 12, and 14 more bugs, respectively, that were overlooked by the original fuzzers. Overall, this work complements prior studies on DL fuzzers with an extensive performance evaluation and provides a benchmark for future DL library fuzzing studies. Also, our proposed corner case generator proves that the fuzzers can be extended to detect more bugs by extending their internal fuzzing logic based on the insights provided in root cause analysis.

Related papers

May the Feedback Be with You! Unlocking the Power of Feedback-Driven Deep Learning Framework Fuzzing via LLMs [13.976286931563006]
A simple yet effective way to find bugs in Deep Learning (DL) frameworks is fuzz testing (Fuzzing)<n>We propose FUEL to break the seal of Feedback-driven fuzzing for DL frameworks.<n> FUEL has detected 104 bugs for PyTorch and summaries, with 93 confirmed as new bugs, 47 already fixed, and 5 assigned with CVE IDs.
arXiv Detail & Related papers (2025-06-21T08:51:53Z)
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers [59.168391398830515]
We evaluate 12 pre-trained LLMs and one specialized fact-verifier, using a collection of examples from 14 fact-checking benchmarks.<n>We highlight the importance of addressing annotation errors and ambiguity in datasets.<n> frontier LLMs with few-shot in-context examples, often overlooked in previous works, achieve top-tier performance.
arXiv Detail & Related papers (2025-06-16T10:32:10Z)
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG)<n>We quantify the impact of biases, such as a preference for shorter documents, on retrievers like Dragon+ and Contriever.<n>We uncover major vulnerabilities, showing retrievers favor shorter documents, early positions, repeated entities, and literal matches, all while ignoring the answer's presence!
arXiv Detail & Related papers (2025-03-06T23:23:13Z)
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors. Traditional fuzzing struggles with the complexity and API diversity of DL libraries. We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z)
CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph [29.490817477791357]
We propose an automated fuzz testing method driven by a code knowledge graph and powered by an intelligent agent system. The code knowledge graph is constructed through interprocedural program analysis, where each node in the graph represents a code entity. CKGFuzzer achieved an average improvement of 8.73% in code coverage compared to state-of-the-art techniques.
arXiv Detail & Related papers (2024-11-18T12:41:16Z)
Pipe-Cleaner: Flexible Fuzzing Using Security Policies [0.07499722271664144]
Pipe-Cleaner is a system for detecting and analyzing C code vulnerabilities. It is based on flexible developer-designed security policies enforced by a tag-based runtime reference monitor. We demonstrate the potential of this approach on several heap-related security vulnerabilities.
arXiv Detail & Related papers (2024-10-31T23:35:22Z)
G-Fuzz: A Directed Fuzzing Framework for gVisor [48.85077340822625]
G-Fuzz is a directed fuzzing framework for gVisor. G-Fuzz has been deployed in industry and has detected multiple serious vulnerabilities.
arXiv Detail & Related papers (2024-09-20T01:00:22Z)
FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z)
Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain. We propose an adversarial algorithm to make the retriever component robust against distribution shift. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z)
FuzzSlice: Pruning False Positives in Static Analysis Warnings Through Function-Level Fuzzing [5.748423489074936]
We propose FuzzSlice, a framework that automatically prunes possible false positives among static analysis warnings. The key insight that we base our work on is that a warning that does not yield a crash when fuzzed at the function level in a given time budget is a possible false positive. FuzzSlice reduces false positives by 62.26% in the open-source repositories and by 100% in the Juliet dataset.
arXiv Detail & Related papers (2024-02-02T21:49:24Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
Prompt Fuzzing for Fuzz Driver Generation [6.238058387665971]
We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing. It iteratively generates fuzz drivers to explore undiscovered library code. PromptFuzz achieved 1.61 and 1.63 times higher branch coverage than OSS-Fuzz and Hopper, respectively.
arXiv Detail & Related papers (2023-12-29T16:43:51Z)
HOPPER: Interpretative Fuzzing for Libraries [6.36596812288503]
HOPPER can fuzz libraries without requiring any domain knowledge. It transforms the problem of library fuzzing into the problem of interpreter fuzzing.
arXiv Detail & Related papers (2023-09-07T06:11:18Z)
What Happens When We Fuzz? Investigating OSS-Fuzz Bug History [0.9772968596463595]
We analyzed 44,102 reported issues made public by OSS-Fuzz prior to March 12, 2022. We identified the bug-contributing commits to estimate when the bug containing code was introduced, and measure the timeline from introduction to detection to fix.
arXiv Detail & Related papers (2023-05-19T05:15:36Z)
Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model. We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z)
DeFuzz: Deep Learning Guided Directed Fuzzing [41.61500799890691]
We propose a deep learning (DL) guided directed fuzzing for software vulnerability detection, named DeFuzz. DeFuzz includes two main schemes: (1) we employ a pre-trained DL prediction model to identify the potentially vulnerable functions and the locations (i.e., vulnerable addresses) Precisely, we employ Bidirectional-LSTM (BiLSTM) to identify attention words, and the vulnerabilities are associated with these attention words in functions.
arXiv Detail & Related papers (2020-10-23T03:44:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.