Related papers: Assumption-lean and Data-adaptive Post-Prediction Inference

Assumption-lean and Data-adaptive Post-Prediction Inference

URL: http://arxiv.org/abs/2311.14220v3
Date: Tue, 6 Feb 2024 21:23:09 GMT
Title: Assumption-lean and Data-adaptive Post-Prediction Inference
Authors: Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, and Qiongshi Lu
Abstract summary: We introduce an assumption-lean and data-adaptive Post-Prediction Inference (POP-Inf) procedure. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML-prediction. We demonstrate the superiority and applicability of our method through simulations and large-scale genomic data.
Score: 1.5050365268347254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce an assumption-lean and data-adaptive Post-Prediction Inference (POP-Inf) procedure that allows valid and powerful inference based on ML-predicted outcomes. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML-prediction, for a wide range of statistical quantities. Its "data-adaptive'" feature guarantees an efficiency gain over existing post-prediction inference methods, regardless of the accuracy of ML-prediction. We demonstrate the superiority and applicability of our method through simulations and large-scale genomic data.

Related papers

A Moment-Based Generalization to Post-Prediction Inference [2.089112028396727]
Artificial intelligence (AI) and machine learning (ML) are increasingly used to generate data for downstream analyses.<n> naively treating these predictions as true observations can lead to biased results and incorrect inference.<n>Wang et al. proposed a method, post-prediction inference, which calibrates inference by modeling the relationship between AI/ML-predicted and observed outcomes.
arXiv Detail & Related papers (2025-07-12T02:33:45Z)
Prediction-Powered Adaptive Shrinkage Estimation [0.9208007322096532]
Prediction-Powered Adaptive Shrinkage (PAS) is a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
arXiv Detail & Related papers (2025-02-20T00:24:05Z)
Another look at inference after prediction [0.0]
prediction-based (PB) inference has emerged to accommodate statistical analysis using a large volume of predictions.<n>We show that a simple modification can be applied to guarantee improvements in efficiency beyond yielding valid inferences.
arXiv Detail & Related papers (2024-11-29T18:12:50Z)
Mechanism learning: Reverse causal inference in the presence of multiple unknown confounding through front-door causal bootstrapping [0.8901073744693314]
A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables. This paper proposes mechanism learning, a simple method which uses front-door causal bootstrapping to deconfound observational data. We test our method on fully synthetic, semi-synthetic and real-world datasets, demonstrating that it can discover reliable, unbiased, causal ML predictors.
arXiv Detail & Related papers (2024-10-26T03:34:55Z)
Task-Agnostic Machine-Learning-Assisted Inference [0.0]
We introduce a novel statistical framework named PSPS for task-agnostic ML-assisted inference. PSPS provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routines.
arXiv Detail & Related papers (2024-05-30T13:19:49Z)
Clustering and Uncertainty Analysis to Improve the Machine Learning-based Predictions of SAFARI-1 Control Follower Assembly Axial Neutron Flux Profiles [2.517043342442487]
The goal of this work is to develop accurate Machine Learning (ML) models for predicting the assembly axial neutron flux profiles in the SAFARI-1 research reactor. The data-driven nature of ML models makes them susceptible to uncertainties which are introduced by sources such as noise in training data. The aim of this work is to improve the ML models for the control assemblies by a combination of supervised and unsupervised ML algorithms.
arXiv Detail & Related papers (2023-12-20T20:22:13Z)
Variance of ML-based software fault predictors: are we really improving fault prediction? [0.3222802562733786]
We experimentally analyze the variance of a state-of-the-art fault prediction approach. We observed a maximum variance of 10.10% in terms of the per-class accuracy metric.
arXiv Detail & Related papers (2023-10-26T09:31:32Z)
Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients. Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z)
Correcting Model Bias with Sparse Implicit Processes [0.9187159782788579]
We show that Sparse Implicit Processes (SIP) is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model. We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model.
arXiv Detail & Related papers (2022-07-21T18:00:01Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design. A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift. Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.