Distribution Awareness for AI System Testing
- URL: http://arxiv.org/abs/2105.02540v1
- Date: Thu, 6 May 2021 09:24:06 GMT
- Title: Distribution Awareness for AI System Testing
- Authors: David Berend
- Abstract summary: We propose a new OOD-guided testing technique which aims to generate new unseen test cases relevant to the underlying DL system task.
Our results show that this technique is able to filter up to 55.44% of error test case on CIFAR-10 and is 10.05% more effective in enhancing robustness.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Deep Learning (DL) is continuously adopted in many safety critical
applications, its quality and reliability start to raise concerns. Similar to
the traditional software development process, testing the DL software to
uncover its defects at an early stage is an effective way to reduce risks after
deployment. Although recent progress has been made in designing novel testing
techniques for DL software, the distribution of generated test data is not
taken into consideration. It is therefore hard to judge whether the identified
errors are indeed meaningful errors to the DL application. Therefore, we
propose a new OOD-guided testing technique which aims to generate new unseen
test cases relevant to the underlying DL system task. Our results show that
this technique is able to filter up to 55.44% of error test case on CIFAR-10
and is 10.05% more effective in enhancing robustness.
Related papers
- What You See Is What You Get: Attention-based Self-guided Automatic Unit Test Generation [3.8244417073114003]
We propose Attention-based Self-guided Automatic Unit Test GenERation (AUGER) approach.
AUGER contains two stages: defect detection and error triggering.
It makes great improvements by 4.7% to 35.3% in terms of F1-score and Precision in defect detection.
It can trigger 23 to 84 more errors than state-of-the-art (SOTA) approaches in unit test generation.
arXiv Detail & Related papers (2024-12-01T14:28:48Z) - StagedVulBERT: Multi-Granular Vulnerability Detection with a Novel Pre-trained Code Model [13.67394549308693]
This study introduces StagedVulBERT, a novel vulnerability detection framework.
CodeBERT-HLS component is designed to capture semantics at both the token and statement levels simultaneously.
In coarse-grained vulnerability detection, StagedVulBERT achieves an F1 score of 92.26%, marking a 6.58% improvement over the best-performing methods.
arXiv Detail & Related papers (2024-10-08T07:46:35Z) - Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision.
Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video
Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems.
We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub.
The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z) - A high performance fingerprint liveness detection method based on
quality related features [66.41574316136379]
The system is tested on a highly challenging database comprising over 10,500 real and fake images.
The proposed solution proves to be robust to the multi-scenario dataset, and presents an overall rate of 90% correctly classified samples.
arXiv Detail & Related papers (2021-11-02T21:09:39Z) - Leveraging Uncertainty for Improved Static Malware Detection Under
Extreme False Positive Constraints [21.241478970181912]
We show how ensembling and Bayesian treatments of machine learning methods for static malware detection allow for improved identification of model errors.
In particular, we improve the true positive rate (TPR) at an actual realized FPR of 1e-5 from an expected 0.69 for previous methods to 0.80 on the best performing model class on the Sophos industry scale dataset.
arXiv Detail & Related papers (2021-08-09T14:30:23Z) - Reinforcement Learning for Test Case Prioritization [0.24366811507669126]
This paper extends recent studies on applying Reinforcement Learning to optimize testing strategies.
We test its ability to adapt to new environments, by testing it on novel data extracted from a financial institution.
We also studied the impact of using Decision Tree (DT) Approximator as a model for memory representation.
arXiv Detail & Related papers (2020-12-18T11:08:20Z) - NADS: Neural Architecture Distribution Search for Uncertainty Awareness [79.18710225716791]
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data.
Existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples.
We propose Neural Architecture Distribution Search (NADS) to identify common building blocks among all uncertainty-aware architectures.
arXiv Detail & Related papers (2020-06-11T17:39:07Z) - Towards Characterizing Adversarial Defects of Deep Learning Software
from the Lens of Uncertainty [30.97582874240214]
Adversarial examples (AEs) represent a typical and important type of defects needed to be urgently addressed.
The intrinsic uncertainty nature of deep learning decisions can be a fundamental reason for its incorrect behavior.
We identify and categorize the uncertainty patterns of benign examples (BEs) and AEs, and find that while BEs and AEs generated by existing methods do follow common uncertainty patterns, some other uncertainty patterns are largely missed.
arXiv Detail & Related papers (2020-04-24T07:29:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.