A Probabilistic Framework for Mutation Testing in Deep Neural Networks
- URL: http://arxiv.org/abs/2208.06018v1
- Date: Thu, 11 Aug 2022 19:45:14 GMT
- Title: A Probabilistic Framework for Mutation Testing in Deep Neural Networks
- Authors: Florian Tambon, Foutse Khomh, Giuliano Antoniol
- Abstract summary: We propose a Probabilistic Mutation Testing (PMT) approach that alleviates the inconsistency problem.
PMT effectively allows a more consistent and informed decision on mutations through evaluation.
- Score: 12.033944769247958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context: Mutation Testing (MT) is an important tool in traditional Software
Engineering (SE) white-box testing. It aims to artificially inject faults in a
system to evaluate a test suite's capability to detect them, assuming that the
test suite defects finding capability will then translate to real faults. If MT
has long been used in SE, it is only recently that it started gaining the
attention of the Deep Learning (DL) community, with researchers adapting it to
improve the testability of DL models and improve the trustworthiness of DL
systems.
Objective: If several techniques have been proposed for MT, most of them
neglected the stochasticity inherent to DL resulting from the training phase.
Even the latest MT approaches in DL, which propose to tackle MT through a
statistical approach, might give inconsistent results. Indeed, as their
statistic is based on a fixed set of sampled training instances, it can lead to
different results across instances set when results should be consistent for
any instance.
Methods: In this work, we propose a Probabilistic Mutation Testing (PMT)
approach that alleviates the inconsistency problem and allows for a more
consistent decision on whether a mutant is killed or not.
Results: We show that PMT effectively allows a more consistent and informed
decision on mutations through evaluation using three models and eight mutation
operators used in previously proposed MT methods. We also analyze the trade-off
between the approximation error and the cost of our method, showing that
relatively small error can be achieved for a manageable cost.
Conclusion: Our results showed the limitation of current MT practices in DNN
and the need to rethink them. We believe PMT is the first step in that
direction which effectively removes the lack of consistency across test
executions of previous methods caused by the stochasticity of DNN training.
Related papers
- Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Automating Behavioral Testing in Machine Translation [9.151054827967933]
We propose to use Large Language Models to generate source sentences tailored to test the behavior of Machine Translation models.
We can then verify whether the MT model exhibits the expected behavior through matching candidate sets.
Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort.
arXiv Detail & Related papers (2023-09-05T19:40:45Z) - Instance-based Learning with Prototype Reduction for Real-Time
Proportional Myocontrol: A Randomized User Study Demonstrating
Accuracy-preserving Data Reduction for Prosthetic Embedded Systems [0.0]
This work presents the design, implementation and validation of learning techniques based on the kNN scheme for gesture detection in prosthetic control.
The influence of parameterization and varying proportionality schemes is analyzed, utilizing an eight-channel-sEMG armband.
arXiv Detail & Related papers (2023-08-21T20:15:35Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Test-Time Adaptation with Perturbation Consistency Learning [32.58879780726279]
We propose a simple test-time adaptation method to promote the model to make stable predictions for samples with distribution shifts.
Our method can achieve higher or comparable performance with less inference time over strong PLM backbones.
arXiv Detail & Related papers (2023-04-25T12:29:22Z) - Mutation Testing of Deep Reinforcement Learning Based on Real Faults [11.584571002297217]
This paper builds on the existing approach of Mutation Testing (MT) to extend it to Reinforcement Learning (RL) systems.
We show that the design choice of the mutation killing definition can affect whether or not a mutation is killed as well as the generated test cases.
arXiv Detail & Related papers (2023-01-13T16:45:56Z) - Better Uncertainty Quantification for Machine Translation Evaluation [17.36759906285316]
We train the COMET metric with new heteroscedastic regression, divergence minimization, and direct uncertainty prediction objectives.
Experiments show improved results on WMT20 and WMT21 metrics task datasets and a substantial reduction in computational costs.
arXiv Detail & Related papers (2022-04-13T17:49:25Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.