Causal Inference With Selectively Deconfounded Data
- URL: http://arxiv.org/abs/2002.11096v4
- Date: Sun, 7 Mar 2021 01:33:15 GMT
- Title: Causal Inference With Selectively Deconfounded Data
- Authors: Kyra Gan, Andrew A. Li, Zachary C. Lipton, Sridhar Tayur
- Abstract summary: We consider the benefit of incorporating a large confounded observational dataset (confounder unobserved) alongside a small deconfounded observational dataset (confounder revealed) when estimating the Average Treatment Effect (ATE)
Our theoretical results suggest that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level.
- Score: 22.624714904663424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given only data generated by a standard confounding graph with unobserved
confounder, the Average Treatment Effect (ATE) is not identifiable. To estimate
the ATE, a practitioner must then either (a) collect deconfounded data;(b) run
a clinical trial; or (c) elucidate further properties of the causal graph that
might render the ATE identifiable. In this paper, we consider the benefit of
incorporating a large confounded observational dataset (confounder unobserved)
alongside a small deconfounded observational dataset (confounder revealed) when
estimating the ATE. Our theoretical results suggest that the inclusion of
confounded data can significantly reduce the quantity of deconfounded data
required to estimate the ATE to within a desired accuracy level. Moreover, in
some cases -- say, genetics -- we could imagine retrospectively selecting
samples to deconfound. We demonstrate that by actively selecting these samples
based upon the (already observed) treatment and outcome, we can reduce sample
complexity further. Our theoretical and empirical results establish that the
worst-case relative performance of our approach (vs. a natural benchmark) is
bounded while our best-case gains are unbounded. Finally, we demonstrate the
benefits of selective deconfounding using a large real-world dataset related to
genetic mutation in cancer.
Related papers
- Progressive Generalization Risk Reduction for Data-Efficient Causal Effect Estimation [30.49865329385806]
Causal effect estimation (CEE) provides a crucial tool for predicting the unobserved counterfactual outcome for an entity.
In this paper, we study a more realistic CEE setting where the labelled data samples are scarce at the beginning.
We propose the Model Agnostic Causal Active Learning (MACAL) algorithm for batch-wise label acquisition.
arXiv Detail & Related papers (2024-11-18T03:17:40Z) - Combining Incomplete Observational and Randomized Data for Heterogeneous Treatment Effects [10.9134216137537]
Existing methods for integrating observational data with randomized data must require textitcomplete observational data.
We propose a resilient approach to textbfCombine textbfIncomplete textbfObservational data and randomized data for HTE estimation.
arXiv Detail & Related papers (2024-10-28T06:19:14Z) - Efficient adjustment for complex covariates: Gaining efficiency with
DOPE [56.537164957672715]
We propose a framework that accommodates adjustment for any subset of information expressed by the covariates.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estorimator (DOPE) for efficient estimation of the average treatment effect (ATE)
Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
arXiv Detail & Related papers (2024-02-20T13:02:51Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Falsification before Extrapolation in Causal Effect Estimation [6.715453431174765]
Causal effects in populations are often estimated using observational datasets.
We propose a meta-algorithm that attempts to reject observational estimates that are biased.
arXiv Detail & Related papers (2022-09-27T21:47:23Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - Quantifying Ignorance in Individual-Level Causal-Effect Estimates under
Hidden Confounding [38.09565581056218]
We study the problem of learning conditional average treatment effects (CATE) from high-dimensional, observational data with unobserved confounders.
We present a new parametric interval estimator suited for high-dimensional data.
arXiv Detail & Related papers (2021-03-08T15:58:06Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.