Stable Prediction via Leveraging Seed Variable
- URL: http://arxiv.org/abs/2006.05076v1
- Date: Tue, 9 Jun 2020 06:56:31 GMT
- Title: Stable Prediction via Leveraging Seed Variable
- Authors: Kun Kuang, Bo Li, Peng Cui, Yue Liu, Jianrong Tao, Yueting Zhuang and
Fei Wu
- Abstract summary: Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
- Score: 73.9770220107874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on the problem of stable prediction across unknown
test data, where the test distribution is agnostic and might be totally
different from the training one. In such a case, previous machine learning
methods might exploit subtly spurious correlations in training data induced by
non-causal variables for prediction. Those spurious correlations are changeable
across data, leading to instability of prediction across data. By assuming the
relationships between causal variables and response variable are invariant
across data, to address this problem, we propose a conditional independence
test based algorithm to separate those causal variables with a seed variable as
priori, and adopt them for stable prediction. By assuming the independence
between causal and non-causal variables, we show, both theoretically and with
empirical experiments, that our algorithm can precisely separate causal and
non-causal variables for stable prediction across test data. Extensive
experiments on both synthetic and real-world datasets demonstrate that our
algorithm outperforms state-of-the-art methods for stable prediction.
Related papers
- Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Invariant Probabilistic Prediction [45.90606906307022]
We show that arbitrary distribution shifts do not, in general, admit invariant and robust probabilistic predictions.
We propose a method to yield invariant probabilistic predictions, called IPP, and study the consistency of the underlying parameters.
arXiv Detail & Related papers (2023-09-18T18:50:24Z) - Testing Independence of Exchangeable Random Variables [19.973896010415977]
Given well-shuffled data, can we determine whether the data items are statistically (in)dependent?
We will show that this is possible and develop tests that can confidently reject the null hypothesis that data is independent and identically distributed.
One potential application is in Deep Learning, where data is often scraped from the whole internet, with duplications abound.
arXiv Detail & Related papers (2022-10-22T08:55:48Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Counterfactual Invariance to Spurious Correlations: Why and How to Pass
Stress Tests [87.60900567941428]
A spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter.
In machine learning, these have a know-it-when-you-see-it character.
We study stress testing using the tools of causal inference.
arXiv Detail & Related papers (2021-05-31T14:39:38Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - Causal Transfer Random Forest: Combining Logged Data and Randomized
Experiments for Robust Prediction [8.736551469632758]
We describe a causal transfer random forest (CTRF) that combines existing training data with a small amount of data from a randomized experiment to train a model.
We evaluate the CTRF using both synthetic data experiments and real-world experiments in the Bing Ads platform.
arXiv Detail & Related papers (2020-10-17T03:54:37Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Stable Prediction with Model Misspecification and Agnostic Distribution
Shift [41.26323389341987]
In machine learning algorithms, two main assumptions are required to guarantee performance.
One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly specified.
Under model misspecification, distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data.
We propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimize a variable decorrelation regularizer and a weighted regression model.
arXiv Detail & Related papers (2020-01-31T08:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.