Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug
Unearthing
- URL: http://arxiv.org/abs/2403.03897v1
- Date: Wed, 6 Mar 2024 17:57:03 GMT
- Title: Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug
Unearthing
- Authors: Asmita, Yaroslav Oliinyk, Michael Scott, Ryan Tsang, Chongzhou Fang,
Houman Homayoun
- Abstract summary: Vulnerabilities in BusyBox can have far-reaching consequences.
The study revealed the prevalence of older BusyBox versions in real-world embedded products.
We introduce two techniques to fortify software testing.
- Score: 2.4287247817521096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: BusyBox, an open-source software bundling over 300 essential Linux commands
into a single executable, is ubiquitous in Linux-based embedded devices.
Vulnerabilities in BusyBox can have far-reaching consequences, affecting a wide
array of devices. This research, driven by the extensive use of BusyBox, delved
into its analysis. The study revealed the prevalence of older BusyBox versions
in real-world embedded products, prompting us to conduct fuzz testing on
BusyBox. Fuzzing, a pivotal software testing method, aims to induce crashes
that are subsequently scrutinized to uncover vulnerabilities. Within this
study, we introduce two techniques to fortify software testing. The first
technique enhances fuzzing by leveraging Large Language Models (LLM) to
generate target-specific initial seeds. Our study showed a substantial increase
in crashes when using LLM-generated initial seeds, highlighting the potential
of LLM to efficiently tackle the typically labor-intensive task of generating
target-specific initial seeds. The second technique involves repurposing
previously acquired crash data from similar fuzzed targets before initiating
fuzzing on a new target. This approach streamlines the time-consuming fuzz
testing process by providing crash data directly to the new target before
commencing fuzzing. We successfully identified crashes in the latest BusyBox
target without conducting traditional fuzzing, emphasizing the effectiveness of
LLM and crash reuse techniques in enhancing software testing and improving
vulnerability detection in embedded systems. Additionally, manual triaging was
performed to identify the nature of crashes in the latest BusyBox.
Related papers
- A Code Knowledge Graph-Enhanced System for LLM-Based Fuzz Driver Generation [29.490817477791357]
We propose CodeGraphGPT, a novel system that integrates code knowledge graphs with an intelligent agent to automate the fuzz driver generation process.
By framing fuzz driver creation as a code generation task, CodeGraphGPT leverages program analysis to construct a knowledge graph of code repositories.
We evaluate CodeGraphGPT on eight open-source software projects, achieving an average improvement of 8.73% in code coverage compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-11-18T12:41:16Z) - Pipe-Cleaner: Flexible Fuzzing Using Security Policies [0.07499722271664144]
Pipe-Cleaner is a system for detecting and analyzing C code vulnerabilities.
It is based on flexible developer-designed security policies enforced by a tag-based runtime reference monitor.
We demonstrate the potential of this approach on several heap-related security vulnerabilities.
arXiv Detail & Related papers (2024-10-31T23:35:22Z) - Aligning LLMs to Be Robust Against Prompt Injection [55.07562650579068]
We show that alignment can be a powerful tool to make LLMs more robust against prompt injection attacks.
Our method -- SecAlign -- first builds an alignment dataset by simulating prompt injection attacks.
Our experiments show that SecAlign robustifies the LLM substantially with a negligible hurt on model utility.
arXiv Detail & Related papers (2024-10-07T19:34:35Z) - FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks.
FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z) - FuzzTheREST: An Intelligent Automated Black-box RESTful API Fuzzer [0.0]
This work introduces a black-box API of fuzzy testing tool that employs Reinforcement Learning (RL) for vulnerability detection.
The tool found a total of six unique vulnerabilities and achieved 55% code coverage.
arXiv Detail & Related papers (2024-07-19T14:43:35Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Revisiting Neural Program Smoothing for Fuzzing [8.861172379630899]
This paper presents the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers.
We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance.
We present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers.
arXiv Detail & Related papers (2023-09-28T17:17:11Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.