Benign-Overfitting in Conditional Average Treatment Effect Prediction
with Linear Regression
- URL: http://arxiv.org/abs/2202.05245v2
- Date: Fri, 11 Feb 2022 23:37:24 GMT
- Title: Benign-Overfitting in Conditional Average Treatment Effect Prediction
with Linear Regression
- Authors: Masahiro Kato and Masaaki Imaizumi
- Abstract summary: We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE) with linear regression models.
We show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known.
- Score: 14.493176427999028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the benign overfitting theory in the prediction of the conditional
average treatment effect (CATE), with linear regression models. As the
development of machine learning for causal inference, a wide range of
large-scale models for causality are gaining attention. One problem is that
suspicions have been raised that the large-scale models are prone to
overfitting to observations with sample selection, hence the large models may
not be suitable for causal prediction. In this study, to resolve the
suspicious, we investigate on the validity of causal inference methods for
overparameterized models, by applying the recent theory of benign overfitting
(Bartlett et al., 2020). Specifically, we consider samples whose distribution
switches depending on an assignment rule, and study the prediction of CATE with
linear models whose dimension diverges to infinity. We focus on two methods:
the T-learner, which based on a difference between separately constructed
estimators with each treatment group, and the inverse probability weight
(IPW)-learner, which solves another regression problem approximated by a
propensity score. In both methods, the estimator consists of interpolators that
fit the samples perfectly. As a result, we show that the T-learner fails to
achieve the consistency except the random assignment, while the IPW-learner
converges the risk to zero if the propensity score is known. This difference
stems from that the T-learner is unable to preserve eigenspaces of the
covariances, which is necessary for benign overfitting in the overparameterized
setting. Our result provides new insights into the usage of causal inference
methods in the overparameterizated setting, in particular, doubly robust
estimators.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Awareness of uncertainty in classification using a multivariate model and multi-views [1.3048920509133808]
The proposed model regularizes uncertain predictions, and trains to calculate both the predictions and their uncertainty estimations.
Given the multi-view predictions together with their uncertainties and confidences, we proposed several methods to calculate final predictions.
The proposed methodology was tested using CIFAR-10 dataset with clean and noisy labels.
arXiv Detail & Related papers (2024-04-16T06:40:51Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Exploring Local Explanations of Nonlinear Models Using Animated Linear
Projections [5.524804393257921]
We show how to use eXplainable AI (XAI) to shed light on how a model use predictors to arrive at a prediction.
To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections.
The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models.
arXiv Detail & Related papers (2022-05-11T09:11:02Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.