Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs
- URL: http://arxiv.org/abs/2509.14626v1
- Date: Thu, 18 Sep 2025 05:10:42 GMT
- Title: Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs
- Authors: Feiran Qin, M. M. Abid Naziri, Hengyu Ai, Saikat Dutta, Marcelo d'Amorim,
- Abstract summary: We propose FlashFuzz to automatically synthesize API-level harnesses by combining templates, helper functions, and API documentation.<n>Compared to state-of-the-art fuzzing methods, FlashFuzz achieves up to 101.13 to 212.88 percent higher coverage and 1.0x to 5.4x higher validity rate.<n>Our study confirms that CGF can be effectively applied to Deep Learning libraries and provides a strong baseline for future testing approaches.
- Score: 3.491101173753068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Learning (DL) libraries such as PyTorch provide the core components to build major AI-enabled applications. Finding bugs in these libraries is important and challenging. Prior approaches have tackled this by performing either API-level fuzzing or model-level fuzzing, but they do not use coverage guidance, which limits their effectiveness and efficiency. This raises an intriguing question: can coverage guided fuzzing (CGF), in particular frameworks like LibFuzzer, be effectively applied to DL libraries, and does it offer meaningful improvements in code coverage, bug detection, and scalability compared to prior methods? We present the first in-depth study to answer this question. A key challenge in applying CGF to DL libraries is the need to create a test harness for each API that can transform byte-level fuzzer inputs into valid API inputs. To address this, we propose FlashFuzz, a technique that leverages Large Language Models (LLMs) to automatically synthesize API-level harnesses by combining templates, helper functions, and API documentation. FlashFuzz uses a feedback driven strategy to iteratively synthesize and repair harnesses. With this approach, FlashFuzz synthesizes harnesses for 1,151 PyTorch and 662 TensorFlow APIs. Compared to state-of-the-art fuzzing methods (ACETest, PathFinder, and TitanFuzz), FlashFuzz achieves up to 101.13 to 212.88 percent higher coverage and 1.0x to 5.4x higher validity rate, while also delivering 1x to 1182x speedups in input generation. FlashFuzz has discovered 42 previously unknown bugs in PyTorch and TensorFlow, 8 of which are already fixed. Our study confirms that CGF can be effectively applied to DL libraries and provides a strong baseline for future testing approaches.
Related papers
- Enhancing Fuzz Testing Efficiency through Automated Fuzz Target Generation [0.0]
We introduce an approach to improving fuzz target generation through static analysis of library source code.<n>Our findings are demonstrated through the application of this approach to the generation of fuzz targets for C/C++ libraries.
arXiv Detail & Related papers (2026-01-17T09:08:11Z) - May the Feedback Be with You! Unlocking the Power of Feedback-Driven Deep Learning Framework Fuzzing via LLMs [20.03968975178177]
fuzz testing (Fuzzing) is a simple yet effective way to find bugs in Deep Learning (DL) frameworks.<n>We propose FUEL to effectively utilize the feedback information, which comprises two Large Language Models (LLMs): analysis LLM and generation LLM.<n>We show that FUEL can improve line code coverage of PyTorch and execution by 9.15% and 14.70% over state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-21T08:51:53Z) - Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.<n>Traditional fuzzing struggles with the complexity and API diversity of DL libraries.<n>We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z) - ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution.<n> Experimental results demonstrate that ExploraCoder significantly improves performance for models lacking prior API knowledge.
arXiv Detail & Related papers (2024-12-06T19:00:15Z) - FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks.
FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z) - Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries [8.779035160734523]
This paper introduces an LLM-enhanced differential testing technique for DL libraries.<n>It addresses the challenges of finding alternative implementations for a given API and generating diverse test inputs.<n>It synthesizes counterparts for 1.84 times as many APIs as those found by state-of-the-art techniques.
arXiv Detail & Related papers (2024-06-12T07:06:38Z) - HOPPER: Interpretative Fuzzing for Libraries [6.36596812288503]
HOPPER can fuzz libraries without requiring any domain knowledge.
It transforms the problem of library fuzzing into the problem of interpreter fuzzing.
arXiv Detail & Related papers (2023-09-07T06:11:18Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - torchgfn: A PyTorch GFlowNet library [56.071033896777784]
torchgfn is a PyTorch library that aims to address this need.
It provides users with a simple API for environments and useful abstractions for samplers and losses.
arXiv Detail & Related papers (2023-05-24T00:20:59Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - Torch-Struct: Deep Structured Prediction Library [138.5262350501951]
We introduce Torch-Struct, a library for structured prediction.
Torch-Struct includes a broad collection of probabilistic structures accessed through a simple and flexible distribution-based API.
arXiv Detail & Related papers (2020-02-03T16:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.