Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects
- URL: http://arxiv.org/abs/2201.10743v4
- Date: Mon, 29 Sep 2025 00:04:19 GMT
- Title: Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects
- Authors: AmirEmad Ghassami, Chang Liu, Alan Yang, David Richardson, Ilya Shpitser, Eric Tchetgen Tchetgen,
- Abstract summary: We study identifying and estimating the causal effect of a treatment variable on a long-term outcome using data from an observational and an experimental domain.<n>We propose three approaches for data fusion for the purpose of identifying and estimating the causal effect.
- Score: 12.200097942625376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study identifying and estimating the causal effect of a treatment variable on a long-term outcome using data from an observational and an experimental domain. The observational data are subject to unobserved confounding. Furthermore, subjects in the experiment are only followed for a short period; thus, long-term effects are unobserved, though short-term effects are available. Consequently, neither data source alone suffices for causal inference on the long-term outcome, necessitating a principled fusion of the two. We propose three approaches for data fusion for the purpose of identifying and estimating the causal effect. The first assumes equal confounding bias for short-term and long-term outcomes. The second weakens this assumption by leveraging an observed confounder for which the short-term and long-term potential outcomes share the same partial additive association with this confounder. The third approach employs proxy variables of the latent confounder of the treatment-outcome relationship, extending the proximal causal inference framework to the data fusion setting. For each approach, we develop influence function-based estimators and analyze their robustness properties. We illustrate our methods by estimating the effect of class size on 8th-grade SAT scores using data from the Project STAR experiment combined with observational data from the Early Childhood Longitudinal Study.
Related papers
- Do-PFN: In-Context Learning for Causal Effect Estimation [75.62771416172109]
We show that Prior-data fitted networks (PFNs) can be pre-trained on synthetic data to predict outcomes.<n>Our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.
arXiv Detail & Related papers (2025-06-06T12:43:57Z) - Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z) - Long-Term Individual Causal Effect Estimation via Identifiable Latent Representation Learning [12.38859245341133]
Estimating long-term causal effects by combining long-term observational and short-term experimental data is a crucial but challenging problem in many real-world scenarios.<n>In existing methods, several ideal assumptions are proposed to address the latent confounder problem raised by the observational data.<n>In this paper, we tackle the problem of estimating the long-term individual causal effects without the aforementioned assumptions.
arXiv Detail & Related papers (2025-05-08T12:42:49Z) - Long-term Causal Inference via Modeling Sequential Latent Confounding [79.18609016557]
Long-term causal inference is an important but challenging problem across various scientific domains.<n>We propose an approach based on the Conditional Additive Equi-Confounding Bias (CAECB) assumption.<n>Our proposed assumption states a functional relationship between sequential confounding biases across temporal short-term outcomes.
arXiv Detail & Related papers (2025-02-26T09:56:56Z) - Nonparametric Heterogeneous Long-term Causal Effect Estimation via Data Combination [37.491679058742925]
Long-term causal inference has drawn increasing attention in many scientific domains.<n>It is still understudied how to robustly and effectively estimate heterogeneous long-term causal effects.<n>We propose several two-stage style non-parametric estimators for heterogeneous long-term causal effect estimation.
arXiv Detail & Related papers (2025-02-26T09:17:04Z) - Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights [54.65531750162626]
Long-term treatment effect estimation is a significant but challenging problem in many applications.<n>Existing methods rely on ideal assumptions, such as no unobserved confounders or binary treatment, to estimate long-term average treatment effects.<n>We introduce an optimal transport weighting framework to align the long-term observational data to an auxiliary short-term experimental data.
arXiv Detail & Related papers (2024-06-27T14:13:46Z) - Targeted Machine Learning for Average Causal Effect Estimation Using the
Front-Door Functional [3.0232957374216953]
evaluating the average causal effect (ACE) of a treatment on an outcome often involves overcoming the challenges posed by confounding factors in observational studies.
Here, we introduce novel estimation strategies for the front-door criterion based on the targeted minimum loss-based estimation theory.
We demonstrate the applicability of these estimators to analyze the effect of early stage academic performance on future yearly income.
arXiv Detail & Related papers (2023-12-15T22:04:53Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - Nonparametric Identifiability of Causal Representations from Unknown
Interventions [63.1354734978244]
We study causal representation learning, the task of inferring latent causal variables and their causal relations from mixtures of the variables.
Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data.
arXiv Detail & Related papers (2023-06-01T10:51:58Z) - Estimating long-term causal effects from short-term experiments and
long-term observational data with unobserved confounding [5.854757988966379]
We study the identification and estimation of long-term treatment effects when both experimental and observational data are available.
Our long-term causal effect estimator is obtained by combining regression residuals with short-term experimental outcomes.
arXiv Detail & Related papers (2023-02-21T12:22:47Z) - Neighborhood Adaptive Estimators for Causal Inference under Network
Interference [152.4519491244279]
We consider the violation of the classical no-interference assumption, meaning that the treatment of one individuals might affect the outcomes of another.
To make interference tractable, we consider a known network that describes how interference may travel.
We study estimators for the average direct treatment effect on the treated in such a setting.
arXiv Detail & Related papers (2022-12-07T14:53:47Z) - Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation [10.75801980090826]
We present three novel Bayesian methods to estimate the expectation of the ultimate treatment effect.
These methods differ on the source of uncertainty considered and allow for combining two sources of data.
We generalize these ideas to the off-policy evaluation framework.
arXiv Detail & Related papers (2022-11-02T23:39:36Z) - Generalization bounds and algorithms for estimating conditional average
treatment effect of dosage [13.867315751451494]
We investigate the task of estimating the conditional average causal effect of treatment-dosage pairs from a combination of observational data and assumptions on the causal relationships in the underlying system.
This has been a longstanding challenge for fields of study such as epidemiology or economics that require a treatment-dosage pair to make decisions.
We show empirically new state-of-the-art performance results across several benchmark datasets for this problem.
arXiv Detail & Related papers (2022-05-29T15:26:59Z) - Long-term Causal Inference Under Persistent Confounding via Data Combination [38.026740610259225]
We study the identification and estimation of long-term treatment effects when both experimental and observational data are available.
Since the long-term outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data.
arXiv Detail & Related papers (2022-02-15T07:44:20Z) - SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event
Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data.
We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z) - What can the millions of random treatments in nonexperimental data
reveal about causes? [0.0]
The article introduces one such model and a Bayesian approach to combine the $O(n2)$ pairwise observations typically available in nonexperimnetal data.
We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample.
arXiv Detail & Related papers (2021-05-03T20:13:34Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Split-Treatment Analysis to Rank Heterogeneous Causal Effects for
Prospective Interventions [15.443178111068418]
We propose a split-treatment analysis that ranks the individuals most likely to be positively affected by a prospective intervention.
We show that the ranking of heterogeneous causal effect based on the proxy treatment is the same as the ranking based on the target treatment's effect.
arXiv Detail & Related papers (2020-11-11T16:17:29Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.