Rethinking Diversity in Deep Neural Network Testing
- URL: http://arxiv.org/abs/2305.15698v2
- Date: Mon, 26 Feb 2024 21:17:41 GMT
- Title: Rethinking Diversity in Deep Neural Network Testing
- Authors: Zi Wang, Jihye Choi, Ke Wang, Somesh Jha
- Abstract summary: We propose a shift in perspective for testing deep neural networks (DNNs)
We advocate for the consideration of DNN testing as directed testing problems rather than diversity-based testing tasks.
Our evaluation demonstrates that diversity metrics are particularly weak indicators for identifying buggy inputs resulting from small input perturbations.
- Score: 25.641743200458382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by the success of traditional software testing, numerous diversity
measures have been proposed for testing deep neural networks (DNNs). In this
study, we propose a shift in perspective, advocating for the consideration of
DNN testing as directed testing problems rather than diversity-based testing
tasks. We note that the objective of testing DNNs is specific and well-defined:
identifying inputs that lead to misclassifications. Consequently, a more
precise testing approach is to prioritize inputs with a higher potential to
induce misclassifications, as opposed to emphasizing inputs that enhance
"diversity."
We derive six directed metrics for DNN testing. Furthermore, we conduct a
careful analysis of the appropriate scope for each metric, as applying metrics
beyond their intended scope could significantly diminish their effectiveness.
Our evaluation demonstrates that (1) diversity metrics are particularly weak
indicators for identifying buggy inputs resulting from small input
perturbations, and (2) our directed metrics consistently outperform diversity
metrics in revealing erroneous behaviors of DNNs across all scenarios.
Related papers
- Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual Manifolds [46.47992213722412]
Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence.
We propose Intrinsic Dimension Regularization (IDR), which enhances the fairness and performance of models.
In various image recognition benchmark tests, IDR significantly mitigates model bias while improving its performance.
arXiv Detail & Related papers (2024-04-22T04:16:40Z) - DeepKnowledge: Generalisation-Driven Deep Learning Testing [2.526146573337397]
DeepKnowledge is a systematic testing methodology for DNN-based systems.
It aims to enhance robustness and reduce the residual risk of 'black box' models.
We report improvements of up to 10 percentage points over state-of-the-art coverage criteria for detecting adversarial attacks.
arXiv Detail & Related papers (2024-03-25T13:46:09Z) - Uncertainty in Graph Neural Networks: A Survey [50.63474656037679]
Graph Neural Networks (GNNs) have been extensively used in various real-world applications.
However, the predictive uncertainty of GNNs stemming from diverse sources can lead to unstable and erroneous predictions.
This survey aims to provide a comprehensive overview of the GNNs from the perspective of uncertainty.
arXiv Detail & Related papers (2024-03-11T21:54:52Z) - Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z) - Uncertainty in Natural Language Processing: Sources, Quantification, and
Applications [56.130945359053776]
We provide a comprehensive review of uncertainty-relevant works in the NLP field.
We first categorize the sources of uncertainty in natural language into three types, including input, system, and output.
We discuss the challenges of uncertainty estimation in NLP and discuss potential future directions.
arXiv Detail & Related papers (2023-06-05T06:46:53Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural
Networks [94.63547069706459]
#DNN-Verification problem involves counting the number of input configurations of a DNN that result in a violation of a safety property.
We propose a novel approach that returns the exact count of violations.
We present experimental results on a set of safety-critical benchmarks.
arXiv Detail & Related papers (2023-01-17T18:32:01Z) - gRoMA: a Tool for Measuring the Global Robustness of Deep Neural
Networks [3.2228025627337864]
Deep neural networks (DNNs) are at the forefront of cutting-edge technology, and have been achieving remarkable performance in a variety of complex tasks.
Their integration into safety-critical systems, such as in the aerospace or automotive domains, poses a significant challenge due to the threat of adversarial inputs.
Here, we present gRoMA, an innovative and scalable tool that implements a probabilistic approach to measure the global categorial robustness of a DNN.
arXiv Detail & Related papers (2023-01-05T20:45:23Z) - Generating and Detecting True Ambiguity: A Forgotten Danger in DNN
Supervision Testing [8.210473195536077]
We propose a novel way to generate ambiguous inputs to test Deep Neural Networks (DNNs)
In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems.
We find that those best suited to detect true ambiguity perform worse on invalid, out-of-distribution and adversarial inputs and vice-versa.
arXiv Detail & Related papers (2022-07-21T14:21:34Z) - Black-Box Testing of Deep Neural Networks through Test Case Diversity [1.4700751484033807]
We investigate black-box input diversity metrics as an alternative to white-box coverage criteria.
Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria.
arXiv Detail & Related papers (2021-12-20T20:12:53Z) - NADS: Neural Architecture Distribution Search for Uncertainty Awareness [79.18710225716791]
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data.
Existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples.
We propose Neural Architecture Distribution Search (NADS) to identify common building blocks among all uncertainty-aware architectures.
arXiv Detail & Related papers (2020-06-11T17:39:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.