FalseCrashReducer: Mitigating False Positive Crashes in OSS-Fuzz-Gen Using Agentic AI
- URL: http://arxiv.org/abs/2510.02185v1
- Date: Thu, 02 Oct 2025 16:36:56 GMT
- Title: FalseCrashReducer: Mitigating False Positive Crashes in OSS-Fuzz-Gen Using Agentic AI
- Authors: Paschal C. Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez, Jonathan Metzman, Oliver Chang, James C. Davis,
- Abstract summary: This paper presents two AI-driven strategies to reduce false positives in OSS-Fuzz-Gen, a multi-agent system for automated fuzz driver generation.<n>First, constraint-based fuzz driver generation proactively enforces constraints on a function's inputs and state to guide driver creation.<n>Second, context-based crash validation reactively analyzes function callers to determine whether reported crashes are feasible from program entry points.
- Score: 9.23077204004445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fuzz testing has become a cornerstone technique for identifying software bugs and security vulnerabilities, with broad adoption in both industry and open-source communities. Directly fuzzing a function requires fuzz drivers, which translate random fuzzer inputs into valid arguments for the target function. Given the cost and expertise required to manually develop fuzz drivers, methods exist that leverage program analysis and Large Language Models to automatically generate these drivers. However, the generated fuzz drivers frequently lead to false positive crashes, especially in functions highly structured input and complex state requirements. This problem is especially crucial in industry-scale fuzz driver generation efforts like OSS-Fuzz-en, as reporting false positive crashes to maintainers impede trust in both the system and the team. This paper presents two AI-driven strategies to reduce false positives in OSS-Fuzz-Gen, a multi-agent system for automated fuzz driver generation. First, constraint-based fuzz driver generation proactively enforces constraints on a function's inputs and state to guide driver creation. Second, context-based crash validation reactively analyzes function callers to determine whether reported crashes are feasible from program entry points. Using 1,500 benchmark functions from OSS-Fuzz, we show that these strategies reduce spurious crashes by up to 8%, cut reported crashes by more than half, and demonstrate that frontier LLMs can serve as reliable program analysis agents. Our results highlight the promise and challenges of integrating AI into large-scale fuzzing pipelines.
Related papers
- Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All [57.23434868678603]
Live-kBench is an evaluation framework for self-evolving benchmarks that scrapes and evaluates agents on freshly discovered kernel bugs.<n> kEnv is an agent-agnostic crash-resolution environment for kernel compilation, execution, and feedback.<n>Using kEnv, we benchmark three state-of-the-art agents, showing that they resolve 74% of crashes on the first attempt.
arXiv Detail & Related papers (2026-02-02T19:06:15Z) - Enhancing Fuzz Testing Efficiency through Automated Fuzz Target Generation [0.0]
We introduce an approach to improving fuzz target generation through static analysis of library source code.<n>Our findings are demonstrated through the application of this approach to the generation of fuzz targets for C/C++ libraries.
arXiv Detail & Related papers (2026-01-17T09:08:11Z) - Poster: Machine Learning for Vulnerability Detection as Target Oracle in Automated Fuzz Driver Generation [0.0]
In vulnerability detection, machine learning has been used as an effective static analysis technique, although it suffers from a significant rate of false positives.<n>We propose an automated fuzz driver generation workflow composed of: (1) identifying a likely vulnerable function by leveraging a machine learning for vulnerability detection model as a target oracle, (2) automatically generating fuzz drivers, and (3) fuzzing the target function to find bugs which could confirm the vulnerability inferred by the target oracle.
arXiv Detail & Related papers (2025-05-02T09:02:36Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph [29.490817477791357]
We propose an automated fuzz testing method driven by a code knowledge graph and powered by an intelligent agent system.<n>The code knowledge graph is constructed through interprocedural program analysis, where each node in the graph represents a code entity.<n> CKGFuzzer achieved an average improvement of 8.73% in code coverage compared to state-of-the-art techniques.
arXiv Detail & Related papers (2024-11-18T12:41:16Z) - FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks.
FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z) - Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug
Unearthing [2.4287247817521096]
Vulnerabilities in BusyBox can have far-reaching consequences.
The study revealed the prevalence of older BusyBox versions in real-world embedded products.
We introduce two techniques to fortify software testing.
arXiv Detail & Related papers (2024-03-06T17:57:03Z) - How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation [31.77886516971502]
This study is the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers.
Our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens)
Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.
arXiv Detail & Related papers (2023-07-24T01:49:05Z) - Infrastructure-based End-to-End Learning and Prevention of Driver
Failure [68.0478623315416]
FailureNet is a recurrent neural network trained end-to-end on trajectories of both nominal and reckless drivers in a scaled miniature city.
It can accurately identify control failures, upstream perception errors, and speeding drivers, distinguishing them from nominal driving.
Compared to speed or frequency-based predictors, FailureNet's recurrent neural network structure provides improved predictive power, yielding upwards of 84% accuracy when deployed on hardware.
arXiv Detail & Related papers (2023-03-21T22:55:51Z) - Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine.
These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults.
We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z) - Sample-Efficient Safety Assurances using Conformal Prediction [57.92013073974406]
Early warning systems can provide alerts when an unsafe situation is imminent.
To reliably improve safety, these warning systems should have a provable false negative rate.
We present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics.
arXiv Detail & Related papers (2021-09-28T23:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.