Balance-Subsampled Stable Prediction
- URL: http://arxiv.org/abs/2006.04381v1
- Date: Mon, 8 Jun 2020 07:01:38 GMT
- Title: Balance-Subsampled Stable Prediction
- Authors: Kun Kuang, Hengtao Zhang, Fei Wu, Yueting Zhuang and Aijun Zhang
- Abstract summary: We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
- Score: 55.13512328954456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine learning, it is commonly assumed that training and test data share
the same population distribution. However, this assumption is often violated in
practice because the sample selection bias may induce the distribution shift
from training data to test data. Such a model-agnostic distribution shift
usually leads to prediction instability across unknown test data. In this
paper, we propose a novel balance-subsampled stable prediction (BSSP) algorithm
based on the theory of fractional factorial design. It isolates the clear
effect of each predictor from the confounding variables. A design-theoretic
analysis shows that the proposed method can reduce the confounding effects
among predictors induced by the distribution shift, hence improve both the
accuracy of parameter estimation and prediction stability. Numerical
experiments on both synthetic and real-world data sets demonstrate that our
BSSP algorithm significantly outperforms the baseline methods for stable
prediction across unknown test data.
Related papers
- Invariant Probabilistic Prediction [45.90606906307022]
We show that arbitrary distribution shifts do not, in general, admit invariant and robust probabilistic predictions.
We propose a method to yield invariant probabilistic predictions, called IPP, and study the consistency of the underlying parameters.
arXiv Detail & Related papers (2023-09-18T18:50:24Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.
The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients.
Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Causal Transfer Random Forest: Combining Logged Data and Randomized
Experiments for Robust Prediction [8.736551469632758]
We describe a causal transfer random forest (CTRF) that combines existing training data with a small amount of data from a randomized experiment to train a model.
We evaluate the CTRF using both synthetic data experiments and real-world experiments in the Bing Ads platform.
arXiv Detail & Related papers (2020-10-17T03:54:37Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - Stable Prediction with Model Misspecification and Agnostic Distribution
Shift [41.26323389341987]
In machine learning algorithms, two main assumptions are required to guarantee performance.
One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly specified.
Under model misspecification, distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data.
We propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimize a variable decorrelation regularizer and a weighted regression model.
arXiv Detail & Related papers (2020-01-31T08:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.