Related papers: Rethinking Diversity in Deep Neural Network Testing

Rethinking Diversity in Deep Neural Network Testing

URL: http://arxiv.org/abs/2305.15698v2
Date: Mon, 26 Feb 2024 21:17:41 GMT
Title: Rethinking Diversity in Deep Neural Network Testing
Authors: Zi Wang, Jihye Choi, Ke Wang, Somesh Jha
Abstract summary: We propose a shift in perspective for testing deep neural networks (DNNs) We advocate for the consideration of DNN testing as directed testing problems rather than diversity-based testing tasks. Our evaluation demonstrates that diversity metrics are particularly weak indicators for identifying buggy inputs resulting from small input perturbations.
Score: 25.641743200458382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Motivated by the success of traditional software testing, numerous diversity measures have been proposed for testing deep neural networks (DNNs). In this study, we propose a shift in perspective, advocating for the consideration of DNN testing as directed testing problems rather than diversity-based testing tasks. We note that the objective of testing DNNs is specific and well-defined: identifying inputs that lead to misclassifications. Consequently, a more precise testing approach is to prioritize inputs with a higher potential to induce misclassifications, as opposed to emphasizing inputs that enhance "diversity." We derive six directed metrics for DNN testing. Furthermore, we conduct a careful analysis of the appropriate scope for each metric, as applying metrics beyond their intended scope could significantly diminish their effectiveness. Our evaluation demonstrates that (1) diversity metrics are particularly weak indicators for identifying buggy inputs resulting from small input perturbations, and (2) our directed metrics consistently outperform diversity metrics in revealing erroneous behaviors of DNNs across all scenarios.

Related papers

Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual Manifolds [46.47992213722412]
Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence. We propose Intrinsic Dimension Regularization (IDR), which enhances the fairness and performance of models. In various image recognition benchmark tests, IDR significantly mitigates model bias while improving its performance.
arXiv Detail & Related papers (2024-04-22T04:16:40Z)
DeepKnowledge: Generalisation-Driven Deep Learning Testing [2.526146573337397]
DeepKnowledge is a systematic testing methodology for DNN-based systems. It aims to enhance robustness and reduce the residual risk of 'black box' models. We report improvements of up to 10 percentage points over state-of-the-art coverage criteria for detecting adversarial attacks.
arXiv Detail & Related papers (2024-03-25T13:46:09Z)
Uncertainty in Graph Neural Networks: A Survey [50.63474656037679]
Graph Neural Networks (GNNs) have been extensively used in various real-world applications. However, the predictive uncertainty of GNNs stemming from diverse sources can lead to unstable and erroneous predictions. This survey aims to provide a comprehensive overview of the GNNs from the perspective of uncertainty.
arXiv Detail & Related papers (2024-03-11T21:54:52Z)
Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics. The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing. We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z)
Uncertainty in Natural Language Processing: Sources, Quantification, and Applications [56.130945359053776]
We provide a comprehensive review of uncertainty-relevant works in the NLP field. We first categorize the sources of uncertainty in natural language into three types, including input, system, and output. We discuss the challenges of uncertainty estimation in NLP and discuss potential future directions.
arXiv Detail & Related papers (2023-06-05T06:46:53Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural Networks [94.63547069706459]
#DNN-Verification problem involves counting the number of input configurations of a DNN that result in a violation of a safety property. We propose a novel approach that returns the exact count of violations. We present experimental results on a set of safety-critical benchmarks.
arXiv Detail & Related papers (2023-01-17T18:32:01Z)
gRoMA: a Tool for Measuring the Global Robustness of Deep Neural Networks [3.2228025627337864]
Deep neural networks (DNNs) are at the forefront of cutting-edge technology, and have been achieving remarkable performance in a variety of complex tasks. Their integration into safety-critical systems, such as in the aerospace or automotive domains, poses a significant challenge due to the threat of adversarial inputs. Here, we present gRoMA, an innovative and scalable tool that implements a probabilistic approach to measure the global categorial robustness of a DNN.
arXiv Detail & Related papers (2023-01-05T20:45:23Z)
Generating and Detecting True Ambiguity: A Forgotten Danger in DNN Supervision Testing [8.210473195536077]
We propose a novel way to generate ambiguous inputs to test Deep Neural Networks (DNNs) In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems. We find that those best suited to detect true ambiguity perform worse on invalid, out-of-distribution and adversarial inputs and vice-versa.
arXiv Detail & Related papers (2022-07-21T14:21:34Z)
Black-Box Testing of Deep Neural Networks through Test Case Diversity [1.4700751484033807]
We investigate black-box input diversity metrics as an alternative to white-box coverage criteria. Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria.
arXiv Detail & Related papers (2021-12-20T20:12:53Z)
A Survey on Assessing the Generalization Envelope of Deep Neural Networks: Predictive Uncertainty, Out-of-distribution and Adversarial Samples [77.99182201815763]
Deep Neural Networks (DNNs) achieve state-of-the-art performance on numerous applications. It is difficult to tell beforehand if a DNN receiving an input will deliver the correct output since their decision criteria are usually nontransparent. This survey connects the three fields within the larger framework of investigating the generalization performance of machine learning methods and in particular DNNs.
arXiv Detail & Related papers (2020-08-21T09:12:52Z)
NADS: Neural Architecture Distribution Search for Uncertainty Awareness [79.18710225716791]
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data. Existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples. We propose Neural Architecture Distribution Search (NADS) to identify common building blocks among all uncertainty-aware architectures.
arXiv Detail & Related papers (2020-06-11T17:39:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.