Systematic Assessment of Fuzzers using Mutation Analysis
- URL: http://arxiv.org/abs/2212.03075v3
- Date: Tue, 25 Jul 2023 06:30:27 GMT
- Title: Systematic Assessment of Fuzzers using Mutation Analysis
- Authors: Philipp G\"orz and Bj\"orn Mathis and Keno Hassler and Emre G\"uler
and Thorsten Holz and Andreas Zeller and Rahul Gopinath
- Abstract summary: In software testing, the gold standard for evaluating test quality is mutation analysis.
mutation analysis subsumes various coverage measures and provides a large and diverse set of faults.
We apply modern mutation analysis techniques that pool multiple mutations and allow us -- for the first time -- to evaluate and compare fuzzers with mutation analysis.
- Score: 20.91546707828316
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fuzzing is an important method to discover vulnerabilities in programs.
Despite considerable progress in this area in the past years, measuring and
comparing the effectiveness of fuzzers is still an open research question. In
software testing, the gold standard for evaluating test quality is mutation
analysis, which evaluates a test's ability to detect synthetic bugs: If a set
of tests fails to detect such mutations, it is expected to also fail to detect
real bugs. Mutation analysis subsumes various coverage measures and provides a
large and diverse set of faults that can be arbitrarily hard to trigger and
detect, thus preventing the problems of saturation and overfitting.
Unfortunately, the cost of traditional mutation analysis is exorbitant for
fuzzing, as mutations need independent evaluation.
In this paper, we apply modern mutation analysis techniques that pool
multiple mutations and allow us -- for the first time -- to evaluate and
compare fuzzers with mutation analysis. We introduce an evaluation bench for
fuzzers and apply it to a number of popular fuzzers and subjects. In a
comprehensive evaluation, we show how we can use it to assess fuzzer
performance and measure the impact of improved techniques. The required CPU
time remains manageable: 4.09 CPU years are needed to analyze a fuzzer on seven
subjects and a total of 141,278 mutations. We find that today's fuzzers can
detect only a small percentage of mutations, which should be seen as a
challenge for future research -- notably in improving (1) detecting failures
beyond generic crashes (2) triggering mutations (and thus faults).
Related papers
- Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - An Exploratory Study on Using Large Language Models for Mutation Testing [32.91472707292504]
Large Language Models (LLMs) have shown great potential in code-related tasks but their utility in mutation testing remains unexplored.
This paper investigates the performance of LLMs in generating effective mutations to their usability, fault detection potential, and relationship with real bugs.
We find that compared to existing approaches, LLMs generate more diverse mutations that are behaviorally closer to real bugs.
arXiv Detail & Related papers (2024-06-14T08:49:41Z) - An Empirical Evaluation of Manually Created Equivalent Mutants [54.02049952279685]
Less than 10 % of manually created mutants are equivalent.
Surprisingly, our findings indicate that a significant portion of developers struggle to accurately identify equivalent mutants.
arXiv Detail & Related papers (2024-04-14T13:04:10Z) - Mutation Analysis with Execution Taints [2.574469668220994]
evaluating each mutant separately means a large amount of redundant computation.
We propose execution taints--A novel technique that repurposes dynamic data-flow taints for mutation analysis.
arXiv Detail & Related papers (2024-03-02T09:20:46Z) - Contextual Predictive Mutation Testing [17.832774161583036]
We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method.
Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants.
We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance.
arXiv Detail & Related papers (2023-09-05T17:00:15Z) - Fuzzing for CPS Mutation Testing [3.512722797771289]
We propose a mutation testing approach that leverages fuzz testing, which has proved effective with C and C++ software.
Our empirical evaluation shows that mutation testing based on fuzz testing kills a significantly higher proportion of live mutants than symbolic execution.
arXiv Detail & Related papers (2023-08-15T16:35:31Z) - MuRS: Mutant Ranking and Suppression using Identifier Templates [4.9205581820379765]
Google's mutation testing service integrates diff-based mutation testing into the code review process.
Google's mutation testing service implements a number of suppression rules, which target not-useful mutants.
This paper proposes and evaluates MuRS, an automated approach that groups mutants by patterns in the source code under test.
arXiv Detail & Related papers (2023-06-15T13:43:52Z) - Statistical and Computational Phase Transitions in Group Testing [73.55361918807883]
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease.
We consider two different simple random procedures for assigning individuals tests.
arXiv Detail & Related papers (2022-06-15T16:38:50Z) - SLA$^2$P: Self-supervised Anomaly Detection with Adversarial
Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning.
We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.