A New Central Limit Theorem for the Augmented IPW Estimator: Variance
Inflation, Cross-Fit Covariance and Beyond
- URL: http://arxiv.org/abs/2205.10198v1
- Date: Fri, 20 May 2022 14:17:53 GMT
- Title: A New Central Limit Theorem for the Augmented IPW Estimator: Variance
Inflation, Cross-Fit Covariance and Beyond
- Authors: Kuanhao Jiang, Rajarshi Mukherjee, Subhabrata Sen and Pragya Sur
- Abstract summary: Cross-fit inverse probability weighting (AIPW) with cross-fitting is a popular choice in practice.
We study this cross-fit AIPW estimator under well-specified outcome regression and propensity score models in a high-dimensional regime.
Our work utilizes a novel interplay between three distinct tools--approximate message passing theory, the theory of deterministic equivalents, and the leave-one-out approach.
- Score: 0.9172870611255595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimation of the average treatment effect (ATE) is a central problem in
causal inference. In recent times, inference for the ATE in the presence of
high-dimensional covariates has been extensively studied. Among the diverse
approaches that have been proposed, augmented inverse probability weighting
(AIPW) with cross-fitting has emerged as a popular choice in practice. In this
work, we study this cross-fit AIPW estimator under well-specified outcome
regression and propensity score models in a high-dimensional regime where the
number of features and samples are both large and comparable. Under assumptions
on the covariate distribution, we establish a new CLT for the suitably scaled
cross-fit AIPW that applies without any sparsity assumptions on the underlying
high-dimensional parameters. Our CLT uncovers two crucial phenomena among
others: (i) the AIPW exhibits a substantial variance inflation that can be
precisely quantified in terms of the signal-to-noise ratio and other problem
parameters, (ii) the asymptotic covariance between the pre-cross-fit estimates
is non-negligible even on the root-n scale. In fact, these cross-covariances
turn out to be negative in our setting. These findings are strikingly different
from their classical counterparts. On the technical front, our work utilizes a
novel interplay between three distinct tools--approximate message passing
theory, the theory of deterministic equivalents, and the leave-one-out
approach. We believe our proof techniques should be useful for analyzing other
two-stage estimators in this high-dimensional regime. Finally, we complement
our theoretical results with simulations that demonstrate both the finite
sample efficacy of our CLT and its robustness to our assumptions.
Related papers
- Nonparametric estimation of Hawkes processes with RKHSs [1.775610745277615]
This paper addresses nonparametric estimation of nonlinear Hawkes processes, where the interaction functions are assumed to lie in a reproducing kernel space (RKHS)
Motivated by applications in neuroscience, the model allows complex interaction functions, in order to express exciting and inhibiting effects, but also a combination of both.
It shows that our method achieves a better performance compared to related nonparametric estimation techniques and suits neuronal applications.
arXiv Detail & Related papers (2024-11-01T14:26:50Z) - Estimation and Inference for Causal Functions with Multiway Clustered Data [6.988496457312806]
This paper proposes methods of estimation and uniform inference for a general class of causal functions.
The causal function is identified as a conditional expectation of an adjusted (Neyman-orthogonal) signal.
We apply the proposed methods to analyze the causal relationship between levels in Africa and the historical slave trade.
arXiv Detail & Related papers (2024-09-10T17:17:53Z) - Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - A Practical Upper Bound for the Worst-Case Attribution Deviations [21.341303776931532]
Model attribution is a critical component of deep neural networks (DNNs) for its interpretability to complex models.
Recent studies bring up attention to the security of attribution methods as they are vulnerable to attribution attacks that generate similar images with dramatically different attributions.
Existing works have been investigating empirically improving the robustness of DNNs against those attacks; however, none of them explicitly quantifies the actual deviations of attributions.
In this work, for the first time, a constrained optimization problem is formulated to derive an upper bound that measures the largest dissimilarity of attributions after the samples are perturbed by any noises within a certain region
arXiv Detail & Related papers (2023-03-01T09:07:27Z) - Benign-Overfitting in Conditional Average Treatment Effect Prediction
with Linear Regression [14.493176427999028]
We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE) with linear regression models.
We show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known.
arXiv Detail & Related papers (2022-02-10T18:51:52Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Variance Minimization in the Wasserstein Space for Invariant Causal
Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors.
Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory.
We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z) - Loss function based second-order Jensen inequality and its application
to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution.
PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models.
We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z) - Non-Asymptotic Performance Guarantees for Neural Estimation of
$\mathsf{f}$-Divergences [22.496696555768846]
Statistical distances quantify the dissimilarity between probability distributions.
A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it.
This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs.
arXiv Detail & Related papers (2021-03-11T19:47:30Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.