Related papers: LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries

LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries

URL: http://arxiv.org/abs/2507.15058v1
Date: Sun, 20 Jul 2025 17:38:51 GMT
Title: LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries
Authors: Ian Hardgrove, John D. Hastings,
Abstract summary: We introduce LibLMFuzz, a framework that reduces costs associated with fuzzing closed-source libraries.<n>Tested on four widely-used Linux libraries, LibLMFuzz produced syntactically correct drivers for all 558 fuzz-able API functions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fundamental problem in cybersecurity and computer science is determining whether a program is free of bugs and vulnerabilities. Fuzzing, a popular approach to discovering vulnerabilities in programs, has several advantages over alternative strategies, although it has investment costs in the form of initial setup and continuous maintenance. The choice of fuzzing is further complicated when only a binary library is available, such as the case of closed-source and proprietary software. In response, we introduce LibLMFuzz, a framework that reduces costs associated with fuzzing closed-source libraries by pairing an agentic Large Language Model (LLM) with a lightweight tool-chain (disassembler/compiler/fuzzer) to autonomously analyze stripped binaries, plan fuzz strategies, generate drivers, and iteratively self-repair build or runtime errors. Tested on four widely-used Linux libraries, LibLMFuzz produced syntactically correct drivers for all 558 fuzz-able API functions, achieving 100% API coverage with no human intervention. Across the 1601 synthesized drivers, 75.52% were nominally correct on first execution. The results show that LLM-augmented middleware holds promise in reducing the costs of fuzzing black box components and provides a foundation for future research efforts. Future opportunities exist for research in branch coverage.

Related papers

Scheduzz: Constraint-based Fuzz Driver Generation with Dual Scheduling [16.453200060615234]
We propose a novel automatic library fuzzing technique, Scheduzz, to understand rational usage of libraries and extract API combination constraints.<n>Scheduzz significantly reduces computational overhead and outperforms UTopia on 16 out of 21 libraries.<n>It achieves 1.62x, 1.50x, and 1.89x higher overall coverage than the state-of-the-art techniques CKGFuzzer, Promptfuzz, and the handcrafted project OSS-Fuzz.
arXiv Detail & Related papers (2025-07-24T10:51:11Z)
LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation [56.84049855266145]
We propose a Multi-feedback Smart Contract Fuzzing framework (LLAMA) that integrates evolutionary mutation strategies, and hybrid testing techniques.<n>LLAMA achieves 91% instruction coverage and 90% branch coverage, while detecting 132 out of 148 known vulnerabilities.<n>These results highlight LLAMA's effectiveness, adaptability, and practicality in real-world smart contract security testing scenarios.
arXiv Detail & Related papers (2025-07-16T09:46:58Z)
Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
May the Feedback Be with You! Unlocking the Power of Feedback-Driven Deep Learning Framework Fuzzing via LLMs [13.976286931563006]
A simple yet effective way to find bugs in Deep Learning (DL) frameworks is fuzz testing (Fuzzing)<n>We propose FUEL to break the seal of Feedback-driven fuzzing for DL frameworks.<n> FUEL has detected 104 bugs for PyTorch and summaries, with 93 confirmed as new bugs, 47 already fixed, and 5 assigned with CVE IDs.
arXiv Detail & Related papers (2025-06-21T08:51:53Z)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [60.881609323604685]
Large Language Models (LLMs) accessed via black-box APIs introduce a trust challenge.<n>Users pay for services based on advertised model capabilities.<n> providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs.<n>This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking.
arXiv Detail & Related papers (2025-04-07T03:57:41Z)
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.<n>SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.<n>We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving competitive performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z)
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.<n>Traditional fuzzing struggles with the complexity and API diversity of DL libraries.<n>We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z)
FuzzWiz -- Fuzzing Framework for Efficient Hardware Coverage [2.1626093085892144]
We create an automated hardware fuzzing framework called FuzzWiz. It includes parsing the RTL design module, converting it into C/C++ models, creating generic testbench with assertions, linking, and fuzzing. Our benchmarking results show that we could achieve around 90% of the coverage 10 times faster than traditional simulation regression based approach.
arXiv Detail & Related papers (2024-10-23T10:06:08Z)
$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding [64.00025564372095]
Large language models (LLMs) have shown remarkable capabilities in code generation. The effects of hallucinations (e.g., output noise) make it challenging for LLMs to generate high-quality code in one pass. We propose a simple and effective textbfuncertainty-aware textbfselective textbfcontrastive textbfdecoding.
arXiv Detail & Related papers (2024-09-09T02:07:41Z)
FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z)
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing [2.4287247817521096]
Vulnerabilities in BusyBox can have far-reaching consequences. The study revealed the prevalence of older BusyBox versions in real-world embedded products. We introduce two techniques to fortify software testing.
arXiv Detail & Related papers (2024-03-06T17:57:03Z)
How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation [31.77886516971502]
This study is the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens) Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.
arXiv Detail & Related papers (2023-07-24T01:49:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.