MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing
- URL: http://arxiv.org/abs/2002.08568v2
- Date: Wed, 22 Jul 2020 03:27:51 GMT
- Title: MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing
- Authors: Yaohui Chen, Mansour Ahmadi, Reza Mirzazade farkhani, Boyu Wang, and
Long Lu
- Abstract summary: Machine learning-Enhanced hybrid fUZZing system (MEUZZ)
MEUZZ determines which new seeds are expected to produce better fuzzing yields based on the knowledge learned from past seed scheduling decisions.
Results: MEUZZ significantly outperforms the state-of-the-art grey-box and hybrid fuzzers.
- Score: 21.318110758739675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Seed scheduling is a prominent factor in determining the yields of hybrid
fuzzing. Existing hybrid fuzzers schedule seeds based on fixed heuristics that
aim to predict input utilities. However, such heuristics are not generalizable
as there exists no one-size-fits-all rule applicable to different programs.
They may work well on the programs from which they were derived, but not
others. To overcome this problem, we design a Machine learning-Enhanced hybrid
fUZZing system (MEUZZ), which employs supervised machine learning for adaptive
and generalizable seed scheduling. MEUZZ determines which new seeds are
expected to produce better fuzzing yields based on the knowledge learned from
past seed scheduling decisions made on the same or similar programs. MEUZZ's
learning is based on a series of features extracted via code reachability and
dynamic analysis, which incurs negligible runtime overhead (in microseconds).
Moreover, MEUZZ automatically infers the data labels by evaluating the fuzzing
performance of each selected seed. As a result, MEUZZ is generally applicable
to, and performs well on, various kinds of programs. Our evaluation shows MEUZZ
significantly outperforms the state-of-the-art grey-box and hybrid fuzzers,
achieving 27.1% more code coverage than QSYM. The learned models are reusable
and transferable, which boosts fuzzing performance by 7.1% on average and
improves 68% of the 56 cross-program fuzzing campaigns. MEUZZ discovered 47
deeply hidden and previously unknown bugs--with 21 confirmed and fixed by the
developers--when fuzzing 8 well-tested programs with the same configurations as
used in previous work.
Related papers
- FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks.
FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z) - Selecting Initial Seeds for Better JVM Fuzzing [10.676082981363702]
fuzzing presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features.
It remains unclear whether existing seed selection methods are suitable for fuzzing and whether utilizing program coverage features can enhance effectiveness.
This work takes the first look at initial seed selection in fuzzing, confirming its importance in fuzzing effectiveness and efficiency.
arXiv Detail & Related papers (2024-08-16T04:10:59Z) - Fuzzing at Scale: The Untold Story of the Scheduler [0.48342038441006807]
We show that a well-designed strategy that determines which programs to fuzz and for how long can greatly impact the number of bugs found across the programs.
We develop several schedulers and leverage the most sophisticated one to fuzz simultaneously our newly compiled benchmark of around 5,000 Ubuntu programs, and detect 4908 bugs.
arXiv Detail & Related papers (2024-06-26T04:28:02Z) - FOX: Coverage-guided Fuzzing as Online Stochastic Control [13.3158115776899]
Fuzzing is an effective technique for discovering software vulnerabilities by generating random test inputs executing them against the target program.
This paper addresses the limitations of existing coverage-guided fuzzers, focusing on the scheduler and mutator components.
We present FOX, a proof-of-concept implementation of our control-theoretic approach, and compare it to industry-standard fuzzers.
arXiv Detail & Related papers (2024-06-06T21:21:05Z) - Make out like a (Multi-Armed) Bandit: Improving the Odds of Fuzzer Seed Scheduling with T-Scheduler [8.447499888458633]
Fuzzing is a highly-scalable software testing technique that uncovers bugs in a target program by executing it with mutated inputs.
We propose T-Scheduler, a seed scheduler built on multi-armed bandit theory.
We evaluate T-Scheduler over 35 CPU-yr of fuzzing, comparing it to 11 state-of-the-art schedulers.
arXiv Detail & Related papers (2023-12-07T23:27:55Z) - Fuzzing with Quantitative and Adaptive Hot-Bytes Identification [6.442499249981947]
American fuzzy lop, a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs.
We propose an approach called toolwhich is designed based on the following principles.
Our evaluation results on 10 real-world programs and LAVA-M dataset show that toolachieves sustained increases in branch coverage and discovers more bugs than other fuzzers.
arXiv Detail & Related papers (2023-07-05T13:41:35Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - Transformers meet Stochastic Block Models: Attention with Data-Adaptive
Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention.
We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model.
Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.