Fuzzing with Quantitative and Adaptive Hot-Bytes Identification
- URL: http://arxiv.org/abs/2307.02289v1
- Date: Wed, 5 Jul 2023 13:41:35 GMT
- Title: Fuzzing with Quantitative and Adaptive Hot-Bytes Identification
- Authors: Tai D. Nguyen, Long H. Pham, Jun Sun
- Abstract summary: American fuzzy lop, a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs.
We propose an approach called toolwhich is designed based on the following principles.
Our evaluation results on 10 real-world programs and LAVA-M dataset show that toolachieves sustained increases in branch coverage and discovers more bugs than other fuzzers.
- Score: 6.442499249981947
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fuzzing has emerged as a powerful technique for finding security bugs in
complicated real-world applications. American fuzzy lop (AFL), a leading
fuzzing tool, has demonstrated its powerful bug finding ability through a vast
number of reported CVEs. However, its random mutation strategy is unable to
generate test inputs that satisfy complicated branching conditions (e.g.,
magic-byte comparisons, checksum tests, and nested if-statements), which are
commonly used in image decoders/encoders, XML parsers, and checksum tools.
Existing approaches (such as Steelix and Neuzz) on addressing this problem
assume unrealistic assumptions such as we can satisfy the branch condition
byte-to-byte or we can identify and focus on the important bytes in the input
(called hot-bytes) once and for all. In this work, we propose an approach
called \tool~which is designed based on the following principles. First, there
is a complicated relation between inputs and branching conditions and thus we
need not only an expressive model to capture such relationship but also an
informative measure so that we can learn such relationship effectively. Second,
different branching conditions demand different hot-bytes and we must adjust
our fuzzing strategy adaptively depending on which branches are the current
bottleneck. We implement our approach as an open source project and compare its
efficiency with other state-of-the-art fuzzers. Our evaluation results on 10
real-world programs and LAVA-M dataset show that \tool~achieves sustained
increases in branch coverage and discovers more bugs than other fuzzers.
Related papers
- LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing [6.042114639413868]
Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput.
In this paper, we explore the potential of utilizing the Large Language Model to enhance greybox fuzzing for structured data.
Our LLM-based fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing.
arXiv Detail & Related papers (2024-06-11T20:48:28Z) - FOX: Coverage-guided Fuzzing as Online Stochastic Control [13.3158115776899]
Fuzzing is an effective technique for discovering software vulnerabilities by generating random test inputs executing them against the target program.
This paper addresses the limitations of existing coverage-guided fuzzers, focusing on the scheduler and mutator components.
We present FOX, a proof-of-concept implementation of our control-theoretic approach, and compare it to industry-standard fuzzers.
arXiv Detail & Related papers (2024-06-06T21:21:05Z) - Generator-Based Fuzzers with Type-Based Targeted Mutation [1.4507298892594764]
In previous work, coverage-guided fuzzers used a mix of static analysis, taint analysis, and constraint-solving approaches to address this problem.
In this paper, we introduce a type-based mutation, along with constant string lookup, for Java GBF.
Results compared to a baseline GBF tool show an almost 20% average improvement in application coverage, and larger improvements when third-party code is included.
arXiv Detail & Related papers (2024-06-04T07:20:13Z) - Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures [0.0]
This research introduces a novel ensemble learning approach for code similarity assessment.
The key idea is that the strengths of a diverse set of similarity measures can complement each other and mitigate individual weaknesses.
arXiv Detail & Related papers (2024-05-03T13:42:49Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Make out like a (Multi-Armed) Bandit: Improving the Odds of Fuzzer Seed Scheduling with T-Scheduler [8.447499888458633]
Fuzzing is a highly-scalable software testing technique that uncovers bugs in a target program by executing it with mutated inputs.
We propose T-Scheduler, a seed scheduler built on multi-armed bandit theory.
We evaluate T-Scheduler over 35 CPU-yr of fuzzing, comparing it to 11 state-of-the-art schedulers.
arXiv Detail & Related papers (2023-12-07T23:27:55Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Precise Zero-Shot Dense Retrieval without Relevance Labels [60.457378374671656]
Hypothetical Document Embeddings(HyDE) is a zero-shot dense retrieval system.
We show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever.
arXiv Detail & Related papers (2022-12-20T18:09:52Z) - Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.