Labeling-Free Comparison Testing of Deep Learning Models
- URL: http://arxiv.org/abs/2204.03994v1
- Date: Fri, 8 Apr 2022 10:55:45 GMT
- Title: Labeling-Free Comparison Testing of Deep Learning Models
- Authors: Yuejun Guo, Qiang Hu, Maxime Cordy, Xiaofei Xie, Mike Papadakis, Yves
Le Traon
- Abstract summary: We propose a labeling-free comparison testing approach to overcome the limitations of labeling effort and sampling randomness.
Our approach outperforms the baseline methods by up to 0.74 and 0.53 on Spearman's correlation and Kendall's $tau$, regardless of the dataset and distribution shift.
- Score: 28.47632100019289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various deep neural networks (DNNs) are developed and reported for their
tremendous success in multiple domains. Given a specific task, developers can
collect massive DNNs from public sources for efficient reusing and avoid
redundant work from scratch. However, testing the performance (e.g., accuracy
and robustness) of multiple DNNs and giving a reasonable recommendation that
which model should be used is challenging regarding the scarcity of labeled
data and demand of domain expertise. Existing testing approaches are mainly
selection-based where after sampling, a few of the test data are labeled to
discriminate DNNs. Therefore, due to the randomness of sampling, the
performance ranking is not deterministic. In this paper, we propose a
labeling-free comparison testing approach to overcome the limitations of
labeling effort and sampling randomness. The main idea is to learn a Bayesian
model to infer the models' specialty only based on predicted labels. To
evaluate the effectiveness of our approach, we undertook exhaustive experiments
on 9 benchmark datasets spanning in the domains of image, text, and source
code, and 165 DNNs. In addition to accuracy, we consider the robustness against
synthetic and natural distribution shifts. The experimental results demonstrate
that the performance of existing approaches degrades under distribution shifts.
Our approach outperforms the baseline methods by up to 0.74 and 0.53 on
Spearman's correlation and Kendall's $\tau$, respectively, regardless of the
dataset and distribution shift. Additionally, we investigated the impact of
model quality (accuracy and robustness) and diversity (standard deviation of
the quality) on the testing effectiveness and observe that there is a higher
chance of a good result when the quality is over 50\% and the diversity is
larger than 18\%.
Related papers
- Uncertainty Measurement of Deep Learning System based on the Convex Hull of Training Sets [0.13265175299265505]
We propose To-hull Uncertainty and Closure Ratio, which measures an uncertainty of trained model based on the convex hull of training data.
It can observe the positional relation between the convex hull of the learned data and an unseen sample and infer how extrapolate the sample is from the convex hull.
arXiv Detail & Related papers (2024-05-25T06:25:24Z) - Continual Test-time Domain Adaptation via Dynamic Sample Selection [38.82346845855512]
This paper proposes a Dynamic Sample Selection (DSS) method for Continual Test-time Domain Adaptation (CTDA)
We apply joint positive and negative learning on both high- and low-quality samples to reduce the risk of using wrong information.
Our approach is also evaluated in the 3D point cloud domain, showcasing its versatility and potential for broader applicability.
arXiv Detail & Related papers (2023-10-05T06:35:21Z) - Efficient Testing of Deep Neural Networks via Decision Boundary Analysis [28.868479656437145]
We propose a novel technique, named Aries, that can estimate the performance of DNNs on new unlabeled data.
The estimated accuracy by Aries is only 0.03% -- 2.60% (on average 0.61%) off the true accuracy.
arXiv Detail & Related papers (2022-07-22T08:39:10Z) - ScatterSample: Diversified Label Sampling for Data Efficient Graph
Neural Network Learning [22.278779277115234]
In some applications where graph neural network (GNN) training is expensive, labeling new instances is expensive.
We develop a data-efficient active sampling framework, ScatterSample, to train GNNs under an active learning setting.
Our experiments on five datasets show that ScatterSample significantly outperforms the other GNN active learning baselines.
arXiv Detail & Related papers (2022-06-09T04:05:02Z) - Fake It Till You Make It: Near-Distribution Novelty Detection by
Score-Based Generative Models [54.182955830194445]
existing models either fail or face a dramatic drop under the so-called near-distribution" setting.
We propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data.
Our method improves the near-distribution novelty detection by 6% and passes the state-of-the-art by 1% to 5% across nine novelty detection benchmarks.
arXiv Detail & Related papers (2022-05-28T02:02:53Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.