Assumption-Lean and Data-Adaptive Post-Prediction Inference
- URL: http://arxiv.org/abs/2311.14220v4
- Date: Mon, 16 Sep 2024 17:47:54 GMT
- Title: Assumption-Lean and Data-Adaptive Post-Prediction Inference
- Authors: Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, Qiongshi Lu,
- Abstract summary: We introduce PoSt-Prediction Adaptive inference (PSPA) that allows valid and powerful inference based on ML-predicted data.
We demonstrate the statistical superiority and broad applicability of our method through simulations and real-data applications.
- Score: 1.5050365268347254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be costly, labor-intensive, or invasive to obtain. With the rapid development of machine learning (ML), scientists can now employ ML algorithms to predict gold-standard outcomes with variables that are easier to obtain. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce PoSt-Prediction Adaptive inference (PSPA) that allows valid and powerful inference based on ML-predicted data. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML prediction. Its "data-adaptive" feature guarantees an efficiency gain over existing methods, regardless of the accuracy of ML prediction. We demonstrate the statistical superiority and broad applicability of our method through simulations and real-data applications.
Related papers
- Mechanism learning: Reverse causal inference in the presence of multiple unknown confounding through front-door causal bootstrapping [0.8901073744693314]
A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables.
This paper proposes mechanism learning, a simple method which uses front-door causal bootstrapping to deconfound observational data.
We test our method on fully synthetic, semi-synthetic and real-world datasets, demonstrating that it can discover reliable, unbiased, causal ML predictors.
arXiv Detail & Related papers (2024-10-26T03:34:55Z) - Task-Agnostic Machine-Learning-Assisted Inference [0.0]
We introduce a novel statistical framework named PSPS for task-agnostic ML-assisted inference.
PSPS provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routines.
arXiv Detail & Related papers (2024-05-30T13:19:49Z) - Clustering and Uncertainty Analysis to Improve the Machine
Learning-based Predictions of SAFARI-1 Control Follower Assembly Axial
Neutron Flux Profiles [2.517043342442487]
The goal of this work is to develop accurate Machine Learning (ML) models for predicting the assembly axial neutron flux profiles in the SAFARI-1 research reactor.
The data-driven nature of ML models makes them susceptible to uncertainties which are introduced by sources such as noise in training data.
The aim of this work is to improve the ML models for the control assemblies by a combination of supervised and unsupervised ML algorithms.
arXiv Detail & Related papers (2023-12-20T20:22:13Z) - Variance of ML-based software fault predictors: are we really improving
fault prediction? [0.3222802562733786]
We experimentally analyze the variance of a state-of-the-art fault prediction approach.
We observed a maximum variance of 10.10% in terms of the per-class accuracy metric.
arXiv Detail & Related papers (2023-10-26T09:31:32Z) - Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.
The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients.
Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z) - Correcting Model Bias with Sparse Implicit Processes [0.9187159782788579]
We show that Sparse Implicit Processes (SIP) is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model.
We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model.
arXiv Detail & Related papers (2022-07-21T18:00:01Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.