Assumption-lean and Data-adaptive Post-Prediction Inference
- URL: http://arxiv.org/abs/2311.14220v3
- Date: Tue, 6 Feb 2024 21:23:09 GMT
- Title: Assumption-lean and Data-adaptive Post-Prediction Inference
- Authors: Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, and Qiongshi Lu
- Abstract summary: We introduce an assumption-lean and data-adaptive Post-Prediction Inference (POP-Inf) procedure.
Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML-prediction.
We demonstrate the superiority and applicability of our method through simulations and large-scale genomic data.
- Score: 1.5050365268347254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A primary challenge facing modern scientific research is the limited
availability of gold-standard data which can be both costly and labor-intensive
to obtain. With the rapid development of machine learning (ML), scientists have
relied on ML algorithms to predict these gold-standard outcomes with easily
obtained covariates. However, these predicted outcomes are often used directly
in subsequent statistical analyses, ignoring imprecision and heterogeneity
introduced by the prediction procedure. This will likely result in false
positive findings and invalid scientific conclusions. In this work, we
introduce an assumption-lean and data-adaptive Post-Prediction Inference
(POP-Inf) procedure that allows valid and powerful inference based on
ML-predicted outcomes. Its "assumption-lean" property guarantees reliable
statistical inference without assumptions on the ML-prediction, for a wide
range of statistical quantities. Its "data-adaptive'" feature guarantees an
efficiency gain over existing post-prediction inference methods, regardless of
the accuracy of ML-prediction. We demonstrate the superiority and applicability
of our method through simulations and large-scale genomic data.
Related papers
- Task-Agnostic Machine Learning-Assisted Inference [0.0]
We propose a novel statistical framework for task-agnostic ML-assisted inference.
It delivers valid and efficient inference that is robust to arbitrary choices of ML models.
We showcase the validity, versatility, and superiority of our method compared to existing approaches.
arXiv Detail & Related papers (2024-05-30T13:19:49Z) - Variance of ML-based software fault predictors: are we really improving
fault prediction? [0.3222802562733786]
We experimentally analyze the variance of a state-of-the-art fault prediction approach.
We observed a maximum variance of 10.10% in terms of the per-class accuracy metric.
arXiv Detail & Related papers (2023-10-26T09:31:32Z) - Variational Inference with Coverage Guarantees in Simulation-Based Inference [18.818573945984873]
We propose Conformalized Amortized Neural Variational Inference (CANVI)
CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor.
We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation.
arXiv Detail & Related papers (2023-05-23T17:24:04Z) - Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.
The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients.
Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z) - Correcting Model Bias with Sparse Implicit Processes [0.9187159782788579]
We show that Sparse Implicit Processes (SIP) is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model.
We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model.
arXiv Detail & Related papers (2022-07-21T18:00:01Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.