UQ-ARMED: Uncertainty quantification of adversarially-regularized mixed
effects deep learning for clustered non-iid data
- URL: http://arxiv.org/abs/2211.15888v1
- Date: Tue, 29 Nov 2022 02:50:48 GMT
- Title: UQ-ARMED: Uncertainty quantification of adversarially-regularized mixed
effects deep learning for clustered non-iid data
- Authors: Alex Treacher, Kevin Nguyen, Dylan Owens, Daniel Heitjan, Albert
Montillo
- Abstract summary: This work demonstrates the ability to produce readily interpretable statistical metrics for model fit, fixed effects covariance coefficients, and prediction confidence.
In our experiment for AD prognosis, not only do the UQ methods provide these benefits, but several UQ methods maintain the high performance of the original ARMED method.
- Score: 0.6719751155411076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work demonstrates the ability to produce readily interpretable
statistical metrics for model fit, fixed effects covariance coefficients, and
prediction confidence. Importantly, this work compares 4 suitable and commonly
applied epistemic UQ approaches, BNN, SWAG, MC dropout, and ensemble approaches
in their ability to calculate these statistical metrics for the ARMED MEDL
models. In our experiment for AD prognosis, not only do the UQ methods provide
these benefits, but several UQ methods maintain the high performance of the
original ARMED method, some even provide a modest (but not statistically
significant) performance improvement. The ensemble models, especially the
ensemble method with a 90% subsampling, performed well across all metrics we
tested with (1) high performance that was comparable to the non-UQ ARMED model,
(2) properly deweights the confounds probes and assigns them statistically
insignificant p-values, (3) attains relatively high calibration of the output
prediction confidence. Based on the results, the ensemble approaches,
especially with a subsampling of 90%, provided the best all-round performance
for prediction and uncertainty estimation, and achieved our goals to provide
statistical significance for model fit, statistical significance covariate
coefficients, and confidence in prediction, while maintaining the baseline
performance of MEDL using ARMED
Related papers
- Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL)
It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights.
The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Overconfidence is a Dangerous Thing: Mitigating Membership Inference
Attacks by Enforcing Less Confident Prediction [2.2336243882030025]
Machine learning models are vulnerable to membership inference attacks (MIAs)
This work proposes a defense technique, HAMP, that can achieve both strong membership privacy and high accuracy, without requiring extra data.
arXiv Detail & Related papers (2023-07-04T09:50:33Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Newer is not always better: Rethinking transferability metrics, their
peculiarities, stability and performance [5.650647159993238]
Fine-tuning of large pre-trained image and language models on small customized datasets has become increasingly popular.
We show that the statistical problems with covariance estimation drive the poor performance of H-score.
We propose a correction and recommend measuring correlation performance against relative accuracy in such settings.
arXiv Detail & Related papers (2021-10-13T17:24:12Z) - Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks.
First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU.
Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.