Distribution Awareness for AI System Testing
- URL: http://arxiv.org/abs/2105.02540v1
- Date: Thu, 6 May 2021 09:24:06 GMT
- Title: Distribution Awareness for AI System Testing
- Authors: David Berend
- Abstract summary: We propose a new OOD-guided testing technique which aims to generate new unseen test cases relevant to the underlying DL system task.
Our results show that this technique is able to filter up to 55.44% of error test case on CIFAR-10 and is 10.05% more effective in enhancing robustness.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Deep Learning (DL) is continuously adopted in many safety critical
applications, its quality and reliability start to raise concerns. Similar to
the traditional software development process, testing the DL software to
uncover its defects at an early stage is an effective way to reduce risks after
deployment. Although recent progress has been made in designing novel testing
techniques for DL software, the distribution of generated test data is not
taken into consideration. It is therefore hard to judge whether the identified
errors are indeed meaningful errors to the DL application. Therefore, we
propose a new OOD-guided testing technique which aims to generate new unseen
test cases relevant to the underlying DL system task. Our results show that
this technique is able to filter up to 55.44% of error test case on CIFAR-10
and is 10.05% more effective in enhancing robustness.
Related papers
- StagedVulBERT: Multi-Granular Vulnerability Detection with a Novel Pre-trained Code Model [13.67394549308693]
This study introduces StagedVulBERT, a novel vulnerability detection framework.
CodeBERT-HLS component is designed to capture semantics at both the token and statement levels simultaneously.
In coarse-grained vulnerability detection, StagedVulBERT achieves an F1 score of 92.26%, marking a 6.58% improvement over the best-performing methods.
arXiv Detail & Related papers (2024-10-08T07:46:35Z) - Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video
Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems.
We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub.
The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z) - A high performance fingerprint liveness detection method based on
quality related features [66.41574316136379]
The system is tested on a highly challenging database comprising over 10,500 real and fake images.
The proposed solution proves to be robust to the multi-scenario dataset, and presents an overall rate of 90% correctly classified samples.
arXiv Detail & Related papers (2021-11-02T21:09:39Z) - Leveraging Uncertainty for Improved Static Malware Detection Under
Extreme False Positive Constraints [21.241478970181912]
We show how ensembling and Bayesian treatments of machine learning methods for static malware detection allow for improved identification of model errors.
In particular, we improve the true positive rate (TPR) at an actual realized FPR of 1e-5 from an expected 0.69 for previous methods to 0.80 on the best performing model class on the Sophos industry scale dataset.
arXiv Detail & Related papers (2021-08-09T14:30:23Z) - Detecting Operational Adversarial Examples for Reliable Deep Learning [12.175315224450678]
We present the novel notion of "operational AEs" which are AEs that have relatively high chance to be seen in future operation.
An initial design of a new DL testing method to efficiently detect "operational AEs" is provided.
arXiv Detail & Related papers (2021-04-13T08:31:42Z) - Reinforcement Learning for Test Case Prioritization [0.24366811507669126]
This paper extends recent studies on applying Reinforcement Learning to optimize testing strategies.
We test its ability to adapt to new environments, by testing it on novel data extracted from a financial institution.
We also studied the impact of using Decision Tree (DT) Approximator as a model for memory representation.
arXiv Detail & Related papers (2020-12-18T11:08:20Z) - Learn what you can't learn: Regularized Ensembles for Transductive
Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios.
This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data.
We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z) - NADS: Neural Architecture Distribution Search for Uncertainty Awareness [79.18710225716791]
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data.
Existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples.
We propose Neural Architecture Distribution Search (NADS) to identify common building blocks among all uncertainty-aware architectures.
arXiv Detail & Related papers (2020-06-11T17:39:07Z) - Towards Characterizing Adversarial Defects of Deep Learning Software
from the Lens of Uncertainty [30.97582874240214]
Adversarial examples (AEs) represent a typical and important type of defects needed to be urgently addressed.
The intrinsic uncertainty nature of deep learning decisions can be a fundamental reason for its incorrect behavior.
We identify and categorize the uncertainty patterns of benign examples (BEs) and AEs, and find that while BEs and AEs generated by existing methods do follow common uncertainty patterns, some other uncertainty patterns are largely missed.
arXiv Detail & Related papers (2020-04-24T07:29:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.