Related papers: MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

URL: http://arxiv.org/abs/2002.08568v2
Date: Wed, 22 Jul 2020 03:27:51 GMT
Title: MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing
Authors: Yaohui Chen, Mansour Ahmadi, Reza Mirzazade farkhani, Boyu Wang, and Long Lu
Abstract summary: Machine learning-Enhanced hybrid fUZZing system (MEUZZ) MEUZZ determines which new seeds are expected to produce better fuzzing yields based on the knowledge learned from past seed scheduling decisions. Results: MEUZZ significantly outperforms the state-of-the-art grey-box and hybrid fuzzers.
Score: 21.318110758739675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Seed scheduling is a prominent factor in determining the yields of hybrid fuzzing. Existing hybrid fuzzers schedule seeds based on fixed heuristics that aim to predict input utilities. However, such heuristics are not generalizable as there exists no one-size-fits-all rule applicable to different programs. They may work well on the programs from which they were derived, but not others. To overcome this problem, we design a Machine learning-Enhanced hybrid fUZZing system (MEUZZ), which employs supervised machine learning for adaptive and generalizable seed scheduling. MEUZZ determines which new seeds are expected to produce better fuzzing yields based on the knowledge learned from past seed scheduling decisions made on the same or similar programs. MEUZZ's learning is based on a series of features extracted via code reachability and dynamic analysis, which incurs negligible runtime overhead (in microseconds). Moreover, MEUZZ automatically infers the data labels by evaluating the fuzzing performance of each selected seed. As a result, MEUZZ is generally applicable to, and performs well on, various kinds of programs. Our evaluation shows MEUZZ significantly outperforms the state-of-the-art grey-box and hybrid fuzzers, achieving 27.1% more code coverage than QSYM. The learned models are reusable and transferable, which boosts fuzzing performance by 7.1% on average and improves 68% of the 56 cross-program fuzzing campaigns. MEUZZ discovered 47 deeply hidden and previously unknown bugs--with 21 confirmed and fixed by the developers--when fuzzing 8 well-tested programs with the same configurations as used in previous work.

Related papers

FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z)
Selecting Initial Seeds for Better JVM Fuzzing [10.676082981363702]
fuzzing presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features. It remains unclear whether existing seed selection methods are suitable for fuzzing and whether utilizing program coverage features can enhance effectiveness. This work takes the first look at initial seed selection in fuzzing, confirming its importance in fuzzing effectiveness and efficiency.
arXiv Detail & Related papers (2024-08-16T04:10:59Z)
Fuzzing at Scale: The Untold Story of the Scheduler [0.48342038441006807]
We show that a well-designed strategy that determines which programs to fuzz and for how long can greatly impact the number of bugs found across the programs. We develop several schedulers and leverage the most sophisticated one to fuzz simultaneously our newly compiled benchmark of around 5,000 Ubuntu programs, and detect 4908 bugs.
arXiv Detail & Related papers (2024-06-26T04:28:02Z)
FOX: Coverage-guided Fuzzing as Online Stochastic Control [13.3158115776899]
Fuzzing is an effective technique for discovering software vulnerabilities by generating random test inputs executing them against the target program. This paper addresses the limitations of existing coverage-guided fuzzers, focusing on the scheduler and mutator components. We present FOX, a proof-of-concept implementation of our control-theoretic approach, and compare it to industry-standard fuzzers.
arXiv Detail & Related papers (2024-06-06T21:21:05Z)
Make out like a (Multi-Armed) Bandit: Improving the Odds of Fuzzer Seed Scheduling with T-Scheduler [8.447499888458633]
Fuzzing is a highly-scalable software testing technique that uncovers bugs in a target program by executing it with mutated inputs. We propose T-Scheduler, a seed scheduler built on multi-armed bandit theory. We evaluate T-Scheduler over 35 CPU-yr of fuzzing, comparing it to 11 state-of-the-art schedulers.
arXiv Detail & Related papers (2023-12-07T23:27:55Z)
Fuzzing with Quantitative and Adaptive Hot-Bytes Identification [6.442499249981947]
American fuzzy lop, a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs. We propose an approach called toolwhich is designed based on the following principles. Our evaluation results on 10 real-world programs and LAVA-M dataset show that toolachieves sustained increases in branch coverage and discovers more bugs than other fuzzers.
arXiv Detail & Related papers (2023-07-05T13:41:35Z)
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges. We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z)
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z)
NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences. The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation. In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z)
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention. We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model. Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.