Doubly Robust Estimation with Machine Learning Predictions
- URL: http://arxiv.org/abs/2108.01768v1
- Date: Tue, 3 Aug 2021 22:01:55 GMT
- Title: Doubly Robust Estimation with Machine Learning Predictions
- Authors: Mehdi Rostami, Olli Saarela, Michael Escobar
- Abstract summary: We propose the normalization of AIPW (referred to as nAIPW) which can be helpful in some scenarios.
Our simulations indicate that AIPW suffers extensively if no regularization is utilized.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The estimation of Average Treatment Effect (ATE) as a causal parameter is
carried out in two steps, wherein the first step, the treatment, and outcome
are modeled to incorporate the potential confounders, and in the second step,
the predictions are inserted into the ATE estimators such as the Augmented
Inverse Probability Weighting (AIPW) estimator. Due to the concerns regarding
the nonlinear or unknown relationships between confounders and the treatment
and outcome, there has been an interest in applying non-parametric methods such
as Machine Learning (ML) algorithms instead. \cite{farrell2018deep} proposed to
use two separate Neural Networks (NNs) where there's no regularization on the
network's parameters except the Stochastic Gradient Descent (SGD) in the NN's
optimization. Our simulations indicate that the AIPW estimator suffers
extensively if no regularization is utilized. We propose the normalization of
AIPW (referred to as nAIPW) which can be helpful in some scenarios. nAIPW,
provably, has the same properties as AIPW, that is double-robustness and
orthogonality \citep{chernozhukov2018double}. Further, if the first step
algorithms converge fast enough, under regulatory conditions
\citep{chernozhukov2018double}, nAIPW will be asymptotically normal.
Related papers
- Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - The Bias-Variance Tradeoff of Doubly Robust Estimator with Targeted
$L_1$ regularized Neural Networks Predictions [0.0]
The Doubly Robust (DR) estimation of ATE can be carried out in 2 steps, where in the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the DR estimator.
The model misspecification in the first step has led researchers to utilize Machine Learning algorithms instead of parametric algorithms.
arXiv Detail & Related papers (2021-08-02T15:41:27Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Non-asymptotic estimates for TUSLA algorithm for non-convex learning
with applications to neural networks with ReLU activation function [3.5044892799305956]
We provide a non-asymptotic analysis for the tamed un-adjusted Langevin algorithm (TUSLA) introduced in Lovas et al.
In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wassersteinstein-1-2.
We show that the TUSLA algorithm converges rapidly to the optimal solution.
arXiv Detail & Related papers (2021-07-19T07:13:02Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv Detail & Related papers (2020-11-13T21:54:28Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Self-Concordant Analysis of Generalized Linear Bandits with Forgetting [2.282313031205821]
We focus on self-concordant GLB (which include logistic regression) with achieved by the use of a Poisson window or exponential weights.
We propose a novel approach to address the potential approach to address the proposed approach to address the Generalized Bandits (GLB) problem.
arXiv Detail & Related papers (2020-11-02T08:36:39Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.