Related papers: DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score

DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score

URL: http://arxiv.org/abs/2109.07514v1
Date: Wed, 15 Sep 2021 18:20:50 GMT
Title: DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score
Authors: Vincenzo Riccio, Nargiz Humbatova, Gunel Jahangirova, Paolo Tonella
Abstract summary: tool is effective at augmenting the given test set, increasing its capability to detect mutants by 63% on average. A leave-one-out experiment shows that the augmented test set is capable of exposing unseen mutants.
Score: 4.444652484439581
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Learning (DL) components are routinely integrated into software systems that need to perform complex tasks such as image or natural language processing. The adequacy of the test data used to test such systems can be assessed by their ability to expose artificially injected faults (mutations) that simulate real DL faults. In this paper, we describe an approach to automatically generate new test inputs that can be used to augment the existing test set so that its capability to detect DL mutations increases. Our tool DeepMetis implements a search based input generation strategy. To account for the non-determinism of the training and the mutation processes, our fitness function involves multiple instances of the DL model under test. Experimental results show that \tool is effective at augmenting the given test set, increasing its capability to detect mutants by 63% on average. A leave-one-out experiment shows that the augmented test set is capable of exposing unseen mutants, which simulate the occurrence of yet undetected faults.

Related papers

PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization [0.0]
PRIMG (Prioritization and Refinement Integrated Mutation-driven Generation) is a novel framework for incremental and adaptive test case generation for Solidity smart contracts.<n> PRIMG integrates a mutation prioritization module, which employs a machine learning model trained on mutant subsumption graphs to predict the usefulness of surviving mutants.<n>The prioritization module consistently outperformed random mutant selection, enabling the generation of high-impact tests with reduced computational effort.
arXiv Detail & Related papers (2025-05-08T18:30:22Z)
XMutant: XAI-based Fuzzing for Deep Learning Systems [6.878645239814823]
XMutant is a technique that leverages explainable artificial intelligence (XAI) techniques to generate challenging test inputs. Our studies showed that XMutant enables more effective and efficient test generation by focusing on the most impactful parts of the input.
arXiv Detail & Related papers (2025-03-10T12:05:49Z)
How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples. We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics. When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z)
muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults [19.32186653723838]
We first describe a taxonomy of real RL faults obtained by repository mining. Then, we present the mutation operators derived from such real faults and implemented in the tool muPRL. We discuss the experimental results, showing that muPRL is effective at discriminating strong from weak test generators.
arXiv Detail & Related papers (2024-08-27T15:45:13Z)
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing [63.20133320524577]
Large Language Models (LLMs) have demonstrated great potential as generalist assistants. It is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. In this paper, we observe that directly editing a small subset of parameters can effectively modulate specific behaviors of LLMs.
arXiv Detail & Related papers (2024-07-11T17:52:03Z)
An Empirical Evaluation of Manually Created Equivalent Mutants [54.02049952279685]
Less than 10 % of manually created mutants are equivalent. Surprisingly, our findings indicate that a significant portion of developers struggle to accurately identify equivalent mutants.
arXiv Detail & Related papers (2024-04-14T13:04:10Z)
Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. We propose a test-time Forward-Optimization Adaptation (FOA) method. FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z)
Contextual Predictive Mutation Testing [17.832774161583036]
We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method. Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants. We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance.
arXiv Detail & Related papers (2023-09-05T17:00:15Z)
Mutation Testing of Deep Reinforcement Learning Based on Real Faults [11.584571002297217]
This paper builds on the existing approach of Mutation Testing (MT) to extend it to Reinforcement Learning (RL) systems. We show that the design choice of the mutation killing definition can affect whether or not a mutation is killed as well as the generated test cases.
arXiv Detail & Related papers (2023-01-13T16:45:56Z)
TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples. We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z)
Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data. We propose an active sample selection criterion to identify reliable and non-redundant samples. We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z)
Machine Learning Testing in an ADAS Case Study Using Simulation-Integrated Bio-Inspired Search-Based Testing [7.5828169434922]
Deeper generates failure-revealing test scenarios for testing a deep neural network-based lane-keeping system. In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), $(mu+lambda)$ and $(mu,lambda)$ evolution strategies (ES), and particle swarm optimization (PSO) Our evaluation shows the newly proposed test generators in Deeper represent a considerable improvement on the previous version.
arXiv Detail & Related papers (2022-03-22T20:27:40Z)
SLA$^2$P: Self-supervised Anomaly Detection with Adversarial Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning. We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z)
Deep Transformer Networks for Time Series Classification: The NPP Safety Case [59.20947681019466]
An advanced temporal neural network referred to as the Transformer is used within a supervised learning fashion to model the time-dependent NPP simulation data. The Transformer can learn the characteristics of the sequential data and yield promising performance with approximately 99% classification accuracy on the testing dataset.
arXiv Detail & Related papers (2021-04-09T14:26:25Z)
DeepMutation: A Neural Mutation Tool [26.482720255691646]
DeepMutation is a tool wrapping our deep learning model into a fully automated tool chain. It can generate, inject, and test mutants learned from real faults.
arXiv Detail & Related papers (2020-02-12T01:57:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.