Data Fusion for Partial Identification of Causal Effects
- URL: http://arxiv.org/abs/2505.24296v1
- Date: Fri, 30 May 2025 07:13:01 GMT
- Title: Data Fusion for Partial Identification of Causal Effects
- Authors: Quinn Lanners, Cynthia Rudin, Alexander Volfovsky, Harsh Parikh,
- Abstract summary: We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
- Score: 62.56890808004615
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data fusion techniques integrate information from heterogeneous data sources to improve learning, generalization, and decision making across data sciences. In causal inference, these methods leverage rich observational data to improve causal effect estimation, while maintaining the trustworthiness of randomized controlled trials. Existing approaches often relax the strong no unobserved confounding assumption by instead assuming exchangeability of counterfactual outcomes across data sources. However, when both assumptions simultaneously fail - a common scenario in practice - current methods cannot identify or estimate causal effects. We address this limitation by proposing a novel partial identification framework that enables researchers to answer key questions such as: Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion? Our approach introduces interpretable sensitivity parameters that quantify assumption violations and derives corresponding causal effect bounds. We develop doubly robust estimators for these bounds and operationalize breakdown frontier analysis to understand how causal conclusions change as assumption violations increase. We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance. Our analysis reveals that the Project STAR results are robust to simultaneous violations of key assumptions, both on average and across various subgroups of interest. This strengthens confidence in the study's conclusions despite potential unmeasured biases in the data.
Related papers
- Do-PFN: In-Context Learning for Causal Effect Estimation [75.62771416172109]
We show that Prior-data fitted networks (PFNs) can be pre-trained on synthetic data to predict outcomes.<n>Our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.
arXiv Detail & Related papers (2025-06-06T12:43:57Z) - Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes.
Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.
The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Unsupervised Pairwise Causal Discovery on Heterogeneous Data using Mutual Information Measures [49.1574468325115]
Causal Discovery is a technique that tackles the challenge by analyzing the statistical properties of the constituent variables.
We question the current (possibly misleading) baseline results on the basis that they were obtained through supervised learning.
In consequence, we approach this problem in an unsupervised way, using robust Mutual Information measures.
arXiv Detail & Related papers (2024-08-01T09:11:08Z) - Automating the Selection of Proxy Variables of Unmeasured Confounders [16.773841751009748]
We extend the existing proxy variable estimator to accommodate scenarios where multiple unmeasured confounders exist between the treatments and the outcome.
We propose two data-driven methods for the selection of proxy variables and for the unbiased estimation of causal effects.
arXiv Detail & Related papers (2024-05-25T08:53:49Z) - Assumption violations in causal discovery and the robustness of score matching [38.60630271550033]
This paper extensively benchmarks the empirical performance of recent causal discovery methods on observational i.i.d. data.
We show that score matching-based methods demonstrate surprising performance in the false positive and false negative rate of the inferred graph.
We hope this paper will set a new standard for the evaluation of causal discovery methods.
arXiv Detail & Related papers (2023-10-20T09:56:07Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Evaluation of Induced Expert Knowledge in Causal Structure Learning by
NOTEARS [1.5469452301122175]
We study the impact of expert knowledge on causal relations in the form of additional constraints used in the formulation of the nonparametric NOTEARS model.
We found that (i) knowledge that corrects the mistakes of the NOTEARS model can lead to statistically significant improvements, (ii) constraints on active edges have a larger positive impact on causal discovery than inactive edges, and surprisingly, (iii) the induced knowledge does not correct on average more incorrect active and/or inactive edges than expected.
arXiv Detail & Related papers (2023-01-04T20:39:39Z) - Valid Inference After Causal Discovery [73.87055989355737]
We develop tools for valid post-causal-discovery inference.
We show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates.
arXiv Detail & Related papers (2022-08-11T17:40:45Z) - BayesIMP: Uncertainty Quantification for Causal Data Fusion [52.184885680729224]
We study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable.
We introduce a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-06-07T10:14:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.