What can the millions of random treatments in nonexperimental data
reveal about causes?
- URL: http://arxiv.org/abs/2105.01152v1
- Date: Mon, 3 May 2021 20:13:34 GMT
- Title: What can the millions of random treatments in nonexperimental data
reveal about causes?
- Authors: Andre F. Ribeiro, Frank Neffke and Ricardo Hausmann
- Abstract summary: The article introduces one such model and a Bayesian approach to combine the $O(n2)$ pairwise observations typically available in nonexperimnetal data.
We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a new method to estimate causal effects from nonexperimental data.
Each pair of sample units is first associated with a stochastic 'treatment' -
differences in factors between units - and an effect - a resultant outcome
difference. It is then proposed that all such pairs can be combined to provide
more accurate estimates of causal effects in observational data, provided a
statistical model connecting combinatorial properties of treatments to the
accuracy and unbiasedness of their effects. The article introduces one such
model and a Bayesian approach to combine the $O(n^2)$ pairwise observations
typically available in nonexperimnetal data. This also leads to an
interpretation of nonexperimental datasets as incomplete, or noisy, versions of
ideal factorial experimental designs.
This approach to causal effect estimation has several advantages: (1) it
expands the number of observations, converting thousands of individuals into
millions of observational treatments; (2) starting with treatments closest to
the experimental ideal, it identifies noncausal variables that can be ignored
in the future, making estimation easier in each subsequent iteration while
departing minimally from experiment-like conditions; (3) it recovers individual
causal effects in heterogeneous populations. We evaluate the method in
simulations and the National Supported Work (NSW) program, an intensively
studied program whose effects are known from randomized field experiments. We
demonstrate that the proposed approach recovers causal effects in common NSW
samples, as well as in arbitrary subpopulations and an order-of-magnitude
larger supersample with the entire national program data, outperforming
Statistical, Econometrics and Machine Learning estimators in all cases...
Related papers
- Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts [12.289361708127876]
We use methodology for learning multi-accurate predictors to post-process CATE T-learners.
We show how this approach can combine (large) confounded observational and (smaller) randomized datasets.
arXiv Detail & Related papers (2024-05-28T14:12:25Z) - Identification of Single-Treatment Effects in Factorial Experiments [0.0]
I show that when multiple interventions are randomized in experiments, the effect any single intervention would have outside the experimental setting is not identified absent heroic assumptions.
observational studies and factorial experiments provide information about potential-outcome distributions with zero and multiple interventions.
I show that researchers who rely on this type of design have to justify either linearity of functional forms or specify with Directed Acyclic Graphs how variables are related in the real world.
arXiv Detail & Related papers (2024-05-16T04:01:53Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Combining Experimental and Observational Data for Identification of
Long-Term Causal Effects [13.32091725929965]
We consider the task of estimating the causal effect of a treatment variable on a long-term outcome variable using data from an observational domain and an experimental domain.
The observational data is assumed to be confounded and hence without further assumptions, this dataset alone cannot be used for causal inference either.
arXiv Detail & Related papers (2022-01-26T04:21:14Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.