Multi-Source Causal Inference Using Control Variates
- URL: http://arxiv.org/abs/2103.16689v1
- Date: Tue, 30 Mar 2021 21:20:51 GMT
- Title: Multi-Source Causal Inference Using Control Variates
- Authors: Wenshuo Guo, Serena Wang, Peng Ding, Yixin Wang, Michael I. Jordan
- Abstract summary: We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
- Score: 81.57072928775509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While many areas of machine learning have benefited from the increasing
availability of large and varied datasets, the benefit to causal inference has
been limited given the strong assumptions needed to ensure identifiability of
causal effects; these are often not satisfied in real-world datasets. For
example, many large observational datasets (e.g., case-control studies in
epidemiology, click-through data in recommender systems) suffer from selection
bias on the outcome, which makes the average treatment effect (ATE)
unidentifiable. We propose a general algorithm to estimate causal effects from
\emph{multiple} data sources, where the ATE may be identifiable only in some
datasets but not others. The key idea is to construct control variates using
the datasets in which the ATE is not identifiable. We show theoretically that
this reduces the variance of the ATE estimate. We apply this framework to
inference from observational data under an outcome selection bias, assuming
access to an auxiliary small dataset from which we can obtain a consistent
estimate of the ATE. We construct a control variate by taking the difference of
the odds ratio estimates from the two datasets. Across simulations and two case
studies with real data, we show that this control variate can significantly
reduce the variance of the ATE estimate.
Related papers
- Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Generalizing experimental findings: identification beyond adjustments [2.5889737226898437]
We aim to generalize the results of a randomized controlled trial (RCT) to a target population with the help of some observational data.
We consider examples where the experimental findings cannot be generalized by an adjustment.
We show that the generalization may still be possible by other identification strategies that can be derived by applying do-calculus.
arXiv Detail & Related papers (2022-06-14T09:00:17Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Meta Learning for Causal Direction [29.00522306460408]
We introduce a novel generative model that allows distinguishing cause and effect in the small data setting.
We demonstrate our method on various synthetic as well as real-world data and show that it is able to maintain high accuracy in detecting directions across varying dataset sizes.
arXiv Detail & Related papers (2020-07-06T15:12:05Z) - Causal Inference With Selectively Deconfounded Data [22.624714904663424]
We consider the benefit of incorporating a large confounded observational dataset (confounder unobserved) alongside a small deconfounded observational dataset (confounder revealed) when estimating the Average Treatment Effect (ATE)
Our theoretical results suggest that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level.
arXiv Detail & Related papers (2020-02-25T18:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.