Prediction-Powered Adaptive Shrinkage Estimation
- URL: http://arxiv.org/abs/2502.14166v1
- Date: Thu, 20 Feb 2025 00:24:05 GMT
- Title: Prediction-Powered Adaptive Shrinkage Estimation
- Authors: Sida Li, Nikolaos Ignatiadis,
- Abstract summary: Prediction-Powered Adaptive Shrinkage (PAS) is a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means.
PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
- Score: 0.9208007322096532
- License:
- Abstract: Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI's benefits for individual statistical tasks, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task and then borrows strength across tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
Related papers
- FAB-PPI: Frequentist, Assisted by Bayes, Prediction-Powered Inference [0.0]
Prediction-powered inference (PPI) enables valid statistical inference by combining experimental data with machine learning predictions.
We propose to inform the PPI framework with prior knowledge on the quality of the predictions.
The resulting method, which we call frequentist, assisted by Bayes, PPI (FAB-PPI), improves over PPI when the observed prediction quality is likely under the prior.
arXiv Detail & Related papers (2025-02-04T14:46:08Z) - Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI [12.569286058146343]
We establish a formal connection between the decades-old surrogate outcome model in biostatistics and the emerging field of prediction-powered inference (PPI)
We develop recalibrated prediction-powered inference, a more efficient approach to statistical inference than existing PPI proposals.
We demonstrate significant gains in effective sample size over existing PPI proposals via three applications leveraging state-of-the-art machine learning/AI models.
arXiv Detail & Related papers (2025-01-16T18:30:33Z) - Adaptive Sampling to Reduce Epistemic Uncertainty Using Prediction Interval-Generation Neural Networks [0.0]
This paper presents an adaptive sampling approach designed to reduce epistemic uncertainty in predictive models.
Our primary contribution is the development of a metric that estimates potential epistemic uncertainty.
A batch sampling strategy based on Gaussian processes (GPs) is also proposed.
We test our approach on three unidimensional synthetic problems and a multi-dimensional dataset based on an agricultural field for selecting experimental fertilizer rates.
arXiv Detail & Related papers (2024-12-13T21:21:47Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Bayesian Prediction-Powered Inference [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a framework for PPI based on Bayesian inference that allows researchers to develop new task-appropriate PPI methods easily.
arXiv Detail & Related papers (2024-05-09T18:08:58Z) - Assumption-Lean and Data-Adaptive Post-Prediction Inference [1.5050365268347254]
We introduce PoSt-Prediction Adaptive inference (PSPA) that allows valid and powerful inference based on ML-predicted data.
We demonstrate the statistical superiority and broad applicability of our method through simulations and real-data applications.
arXiv Detail & Related papers (2023-11-23T22:41:30Z) - PPI++: Efficient Prediction-Powered Inference [31.403415618169433]
We present PPI++: a methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions.
The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets.
PPI++ builds on prediction-powered inference (PPI), which targets the same problem setting, improving its computational and statistical efficiency.
arXiv Detail & Related papers (2023-11-02T17:59:04Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.
The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients.
Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.