Understanding Large Language Model Based Fuzz Driver Generation
- URL: http://arxiv.org/abs/2307.12469v4
- Date: Mon, 17 Jun 2024 11:53:37 GMT
- Title: Understanding Large Language Model Based Fuzz Driver Generation
- Authors: Cen Zhang, Mingqiang Bai, Yaowen Zheng, Yeting Li, Wei Ma, Xiaofei Xie, Yuekang Li, Limin Sun, Yang Liu,
- Abstract summary: This study is the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers.
Our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens)
Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.
- Score: 31.77886516971502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM-based (Large Language Model) fuzz driver generation is a promising research area. Unlike traditional program analysis-based method, this text-based approach is more general and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: - While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; - LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; - While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.
Related papers
- What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs)
We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python.
It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z) - Testing LLMs on Code Generation with Varying Levels of Prompt
Specificity [0.0]
Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing.
The potential to transform natural language prompts into executable code promises a major shift in software development practices.
arXiv Detail & Related papers (2023-11-10T23:41:41Z) - Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE [85.76186554492543]
Large Language Models (LLMs) can extend their zero-shot capabilities to multimodal learning through instruction tuning.
negative conflicts and interference may have a worse impact on performance.
We propose a novel framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with MLLMs.
arXiv Detail & Related papers (2023-11-05T15:48:29Z) - Language Models as Zero-Shot Trajectory Generators [10.572264780575564]
Large Language Models (LLMs) have recently shown promise as high-level planners for robots.
It is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves.
This work investigates if an LLM can directly predict a dense sequence of end-effector poses for manipulation tasks.
arXiv Detail & Related papers (2023-10-17T21:57:36Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous
Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.
Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z) - LLMCad: Fast and Scalable On-device Large Language Model Inference [11.103824752113148]
Generative tasks, such as text generation and question answering, hold a crucial position in the realm of mobile applications.
Currently, the execution of these generative tasks heavily depends on Large Language Models (LLMs)
We introduce LLMCad, an on-device inference engine specifically designed for efficient generative Natural Language Processing (NLP) tasks.
arXiv Detail & Related papers (2023-09-08T10:44:19Z) - HOPPER: Interpretative Fuzzing for Libraries [6.36596812288503]
HOPPER can fuzz libraries without requiring any domain knowledge.
It transforms the problem of library fuzzing into the problem of interpreter fuzzing.
arXiv Detail & Related papers (2023-09-07T06:11:18Z) - Hot or Cold? Adaptive Temperature Sampling for Code Generation with
Large Language Models [54.72004797421481]
We conduct the first systematic study to explore a decoding strategy specialized in code generation.
Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling.
Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.
arXiv Detail & Related papers (2023-09-06T06:27:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.