Related papers: DeepKnowledge: Generalisation-Driven Deep Learning Testing

DeepKnowledge: Generalisation-Driven Deep Learning Testing

URL: http://arxiv.org/abs/2403.16768v1
Date: Mon, 25 Mar 2024 13:46:09 GMT
Title: DeepKnowledge: Generalisation-Driven Deep Learning Testing
Authors: Sondess Missaoui, Simos Gerasimou, Nikolaos Matragkas,
Abstract summary: DeepKnowledge is a systematic testing methodology for DNN-based systems. It aims to enhance robustness and reduce the residual risk of 'black box' models. We report improvements of up to 10 percentage points over state-of-the-art coverage criteria for detecting adversarial attacks.
Score: 2.526146573337397
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their unprecedented success, DNNs are notoriously fragile to small shifts in data distribution, demanding effective testing techniques that can assess their dependability. Despite recent advances in DNN testing, there is a lack of systematic testing approaches that assess the DNN's capability to generalise and operate comparably beyond data in their training distribution. We address this gap with DeepKnowledge, a systematic testing methodology for DNN-based systems founded on the theory of knowledge generalisation, which aims to enhance DNN robustness and reduce the residual risk of 'black box' models. Conforming to this theory, DeepKnowledge posits that core computational DNN units, termed Transfer Knowledge neurons, can generalise under domain shift. DeepKnowledge provides an objective confidence measurement on testing activities of DNN given data distribution shifts and uses this information to instrument a generalisation-informed test adequacy criterion to check the transfer knowledge capacity of a test set. Our empirical evaluation of several DNNs, across multiple datasets and state-of-the-art adversarial generation techniques demonstrates the usefulness and effectiveness of DeepKnowledge and its ability to support the engineering of more dependable DNNs. We report improvements of up to 10 percentage points over state-of-the-art coverage criteria for detecting adversarial attacks on several benchmarks, including MNIST, SVHN, and CIFAR.

Related papers

Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual Manifolds [46.47992213722412]
Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence. We propose Intrinsic Dimension Regularization (IDR), which enhances the fairness and performance of models. In various image recognition benchmark tests, IDR significantly mitigates model bias while improving its performance.
arXiv Detail & Related papers (2024-04-22T04:16:40Z)
Online GNN Evaluation Under Test-time Graph Distribution Shifts [92.4376834462224]
A new research problem, online GNN evaluation, aims to provide valuable insights into the well-trained GNNs's ability to generalize to real-world unlabeled graphs. We develop an effective learning behavior discrepancy score, dubbed LeBeD, to estimate the test-time generalization errors of well-trained GNN models.
arXiv Detail & Related papers (2024-03-15T01:28:08Z)
Uncertainty in Graph Neural Networks: A Survey [50.63474656037679]
Graph Neural Networks (GNNs) have been extensively used in various real-world applications. However, the predictive uncertainty of GNNs stemming from diverse sources can lead to unstable and erroneous predictions. This survey aims to provide a comprehensive overview of the GNNs from the perspective of uncertainty.
arXiv Detail & Related papers (2024-03-11T21:54:52Z)
Bayesian Neural Networks with Domain Knowledge Priors [52.80929437592308]
We propose a framework for integrating general forms of domain knowledge into a BNN prior. We show that BNNs using our proposed domain knowledge priors outperform those with standard priors.
arXiv Detail & Related papers (2024-02-20T22:34:53Z)
Enumerating Safe Regions in Deep Neural Networks with Provable Probabilistic Guarantees [86.1362094580439]
We introduce the AllDNN-Verification problem: given a safety property and a DNN, enumerate the set of all the regions of the property input domain which are safe. Due to the #P-hardness of the problem, we propose an efficient approximation method called epsilon-ProVe. Our approach exploits a controllable underestimation of the output reachable sets obtained via statistical prediction of tolerance limits.
arXiv Detail & Related papers (2023-08-18T22:30:35Z)
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural Networks [4.528286105252983]
TEASMA is a comprehensive and practical methodology designed to accurately assess the adequacy of test sets for Deep Neural Networks. We evaluate TEASMA with four state-of-the-art test adequacy metrics: Distance-based Surprise Coverage (DSC), Likelihood-based Surprise Coverage (LSC), Input Distribution Coverage (IDC) and Mutation Score (MS)
arXiv Detail & Related papers (2023-08-02T17:56:05Z)
A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural Networks [1.189955933770711]
The reliability of Deep Neural Networks (DNNs) is an essential subject of research. In recent years, several studies have been published accordingly to assess the reliability of DNNs. In this work, we conduct a Systematic Literature Review (SLR) on the reliability assessment methods of DNNs.
arXiv Detail & Related papers (2023-05-09T20:08:30Z)
DeepVigor: Vulnerability Value Ranges and Factors for DNNs' Reliability Assessment [1.189955933770711]
Deep Neural Networks (DNNs) and their accelerators are being deployed more frequently in safety-critical applications. We propose a novel accurate, fine-grain, metric-oriented, and accelerator-agnostic method called DeepVigor.
arXiv Detail & Related papers (2023-03-13T08:55:10Z)
On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Network [26.656444835709905]
We study the internal properties of Deep Neural Networks (DNNs) that affect model robustness under adversarial attacks. We propose the novel concept of the Populated Region Set (PRS), where training samples are populated more frequently.
arXiv Detail & Related papers (2022-07-07T16:06:34Z)
Detecting OODs as datapoints with High Uncertainty [12.040347694782007]
Deep neural networks (DNNs) are known to produce incorrect predictions with very high confidence on out-of-distribution inputs (OODs) This limitation is one of the key challenges in the adoption of DNNs in high-assurance systems such as autonomous driving, air traffic management, and medical diagnosis. Several techniques have been developed to detect inputs where the model's prediction cannot be trusted. We demonstrate the difference in the detection ability of these techniques and propose an ensemble approach for detection of OODs as datapoints with high uncertainty (epistemic or aleatoric)
arXiv Detail & Related papers (2021-08-13T20:07:42Z)
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution. Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs. Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z)
Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey [77.99182201815763]
Deep Neural Networks (DNNs) achieve state-of-the-art results in many different problem settings. DNNs are often treated as black box systems, which complicates their evaluation and validation. One promising field, inspired by the success of convolutional neural networks (CNNs) in computer vision tasks, is to incorporate knowledge about symmetric geometrical transformations.
arXiv Detail & Related papers (2020-06-30T14:56:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.