DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation
Score
- URL: http://arxiv.org/abs/2109.07514v1
- Date: Wed, 15 Sep 2021 18:20:50 GMT
- Title: DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation
Score
- Authors: Vincenzo Riccio, Nargiz Humbatova, Gunel Jahangirova, Paolo Tonella
- Abstract summary: tool is effective at augmenting the given test set, increasing its capability to detect mutants by 63% on average.
A leave-one-out experiment shows that the augmented test set is capable of exposing unseen mutants.
- Score: 4.444652484439581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning (DL) components are routinely integrated into software systems
that need to perform complex tasks such as image or natural language
processing. The adequacy of the test data used to test such systems can be
assessed by their ability to expose artificially injected faults (mutations)
that simulate real DL faults. In this paper, we describe an approach to
automatically generate new test inputs that can be used to augment the existing
test set so that its capability to detect DL mutations increases. Our tool
DeepMetis implements a search based input generation strategy. To account for
the non-determinism of the training and the mutation processes, our fitness
function involves multiple instances of the DL model under test. Experimental
results show that \tool is effective at augmenting the given test set,
increasing its capability to detect mutants by 63% on average. A leave-one-out
experiment shows that the augmented test set is capable of exposing unseen
mutants, which simulate the occurrence of yet undetected faults.
Related papers
- How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults [19.32186653723838]
We first describe a taxonomy of real RL faults obtained by repository mining.
Then, we present the mutation operators derived from such real faults and implemented in the tool muPRL.
We discuss the experimental results, showing that muPRL is effective at discriminating strong from weak test generators.
arXiv Detail & Related papers (2024-08-27T15:45:13Z) - Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing [63.20133320524577]
Large Language Models (LLMs) have demonstrated great potential as generalist assistants.
It is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts.
In this paper, we observe that directly editing a small subset of parameters can effectively modulate specific behaviors of LLMs.
arXiv Detail & Related papers (2024-07-11T17:52:03Z) - An Empirical Evaluation of Manually Created Equivalent Mutants [54.02049952279685]
Less than 10 % of manually created mutants are equivalent.
Surprisingly, our findings indicate that a significant portion of developers struggle to accurately identify equivalent mutants.
arXiv Detail & Related papers (2024-04-14T13:04:10Z) - Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - Contextual Predictive Mutation Testing [17.832774161583036]
We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method.
Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants.
We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance.
arXiv Detail & Related papers (2023-09-05T17:00:15Z) - Mutation Testing of Deep Reinforcement Learning Based on Real Faults [11.584571002297217]
This paper builds on the existing approach of Mutation Testing (MT) to extend it to Reinforcement Learning (RL) systems.
We show that the design choice of the mutation killing definition can affect whether or not a mutation is killed as well as the generated test cases.
arXiv Detail & Related papers (2023-01-13T16:45:56Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Machine Learning Testing in an ADAS Case Study Using
Simulation-Integrated Bio-Inspired Search-Based Testing [7.5828169434922]
Deeper generates failure-revealing test scenarios for testing a deep neural network-based lane-keeping system.
In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), $(mu+lambda)$ and $(mu,lambda)$ evolution strategies (ES), and particle swarm optimization (PSO)
Our evaluation shows the newly proposed test generators in Deeper represent a considerable improvement on the previous version.
arXiv Detail & Related papers (2022-03-22T20:27:40Z) - DeepMutation: A Neural Mutation Tool [26.482720255691646]
DeepMutation is a tool wrapping our deep learning model into a fully automated tool chain.
It can generate, inject, and test mutants learned from real faults.
arXiv Detail & Related papers (2020-02-12T01:57:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.