Robust Black-box Testing of Deep Neural Networks using Co-Domain Coverage
- URL: http://arxiv.org/abs/2408.06766v1
- Date: Tue, 13 Aug 2024 09:42:57 GMT
- Title: Robust Black-box Testing of Deep Neural Networks using Co-Domain Coverage
- Authors: Aishwarya Gupta, Indranil Saha, Piyush Rai,
- Abstract summary: Rigorous testing of machine learning models is necessary for trustworthy deployments.
We present a novel black-box approach for generating test-suites for robust testing of deep neural networks (DNNs)
- Score: 18.355332126489756
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Rigorous testing of machine learning models is necessary for trustworthy deployments. We present a novel black-box approach for generating test-suites for robust testing of deep neural networks (DNNs). Most existing methods create test inputs based on maximizing some "coverage" criterion/metric such as a fraction of neurons activated by the test inputs. Such approaches, however, can only analyze each neuron's behavior or each layer's output in isolation and are unable to capture their collective effect on the DNN's output, resulting in test suites that often do not capture the various failure modes of the DNN adequately. These approaches also require white-box access, i.e., access to the DNN's internals (node activations). We present a novel black-box coverage criterion called Co-Domain Coverage (CDC), which is defined as a function of the model's output and thus takes into account its end-to-end behavior. Subsequently, we develop a new fuzz testing procedure named CoDoFuzz, which uses CDC to guide the fuzzing process to generate a test suite for a DNN. We extensively compare the test suite generated by CoDoFuzz with those generated using several state-of-the-art coverage-based fuzz testing methods for the DNNs trained on six publicly available datasets. Experimental results establish the efficiency and efficacy of CoDoFuzz in generating the largest number of misclassified inputs and the inputs for which the model lacks confidence in its decision.
Related papers
- GIST: Generated Inputs Sets Transferability in Deep Learning [12.147546375400749]
GIST (Generated Inputs Sets Transferability) is a novel approach for the efficient transfer of test sets.
This paper introduces GIST, a novel approach for the efficient transfer of test sets.
arXiv Detail & Related papers (2023-11-01T19:35:18Z) - DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep
Neural Networks [0.6249768559720121]
DeepGD is a black-box multi-objective test selection approach for Deep neural networks (DNNs)
It reduces the cost of labeling by prioritizing the selection of test inputs with high fault revealing power from large unlabeled datasets.
arXiv Detail & Related papers (2023-03-08T20:33:09Z) - The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural
Networks [94.63547069706459]
#DNN-Verification problem involves counting the number of input configurations of a DNN that result in a violation of a safety property.
We propose a novel approach that returns the exact count of violations.
We present experimental results on a set of safety-critical benchmarks.
arXiv Detail & Related papers (2023-01-17T18:32:01Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Black-Box Testing of Deep Neural Networks through Test Case Diversity [1.4700751484033807]
We investigate black-box input diversity metrics as an alternative to white-box coverage criteria.
Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria.
arXiv Detail & Related papers (2021-12-20T20:12:53Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Distribution-Aware Testing of Neural Networks Using Generative Models [5.618419134365903]
The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important.
We show that three recent testing techniques generate significant number of invalid test inputs.
We propose a technique to incorporate the valid input space of the DNN model under test in the test generation process.
arXiv Detail & Related papers (2021-02-26T17:18:21Z) - Computing the Testing Error without a Testing Set [33.068870286618655]
We derive an algorithm to estimate the performance gap between training and testing that does not require any testing dataset.
This allows us to compute the DNN's testing error on unseen samples, even when we do not have access to them.
arXiv Detail & Related papers (2020-05-01T15:35:50Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.