The Hardness of Validating Observational Studies with Experimental Data
- URL: http://arxiv.org/abs/2503.14795v1
- Date: Wed, 19 Mar 2025 00:06:23 GMT
- Title: The Hardness of Validating Observational Studies with Experimental Data
- Authors: Jake Fawkes, Michael O'Riordan, Athanasios Vlontzos, Oriol Corcoll, CiarĂ¡n Mark Gilligan-Lee,
- Abstract summary: We show that it is possible to use experimental data to emphfalsify causal effect estimates from observational data.<n>Our theorem proves that while experimental data can be used to detect bias in observational studies, without additional assumptions on the smoothness of the correction function, it can not be used to remove it.
- Score: 2.9593087583214173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Observational data is often readily available in large quantities, but can lead to biased causal effect estimates due to the presence of unobserved confounding. Recent works attempt to remove this bias by supplementing observational data with experimental data, which, when available, is typically on a smaller scale due to the time and cost involved in running a randomised controlled trial. In this work, we prove a theorem that places fundamental limits on this ``best of both worlds'' approach. Using the framework of impossible inference, we show that although it is possible to use experimental data to \emph{falsify} causal effect estimates from observational data, in general it is not possible to \emph{validate} such estimates. Our theorem proves that while experimental data can be used to detect bias in observational studies, without additional assumptions on the smoothness of the correction function, it can not be used to remove it. We provide a practical example of such an assumption, developing a novel Gaussian Process based approach to construct intervals which contain the true treatment effect with high probability, both inside and outside of the support of the experimental data. We demonstrate our methodology on both simulated and semi-synthetic datasets and make the \href{https://github.com/Jakefawkes/Obs_and_exp_data}{code available}.
Related papers
- Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment.<n>First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population.<n>We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - Combining Incomplete Observational and Randomized Data for Heterogeneous Treatment Effects [10.9134216137537]
Existing methods for integrating observational data with randomized data must require textitcomplete observational data.
We propose a resilient approach to textbfCombine textbfIncomplete textbfObservational data and randomized data for HTE estimation.
arXiv Detail & Related papers (2024-10-28T06:19:14Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - Intervention Generalization: A View from Factor Graph Models [7.117681268784223]
We take a close look at how to warrant a leap from past experiments to novel conditions based on minimal assumptions about the factorization of the distribution of the manipulated system.
A postulated $textitinterventional factor model$ (IFM) may not always be informative, but it conveniently abstracts away a need for explicitly modeling unmeasured confounding and feedback mechanisms.
arXiv Detail & Related papers (2023-06-06T21:44:23Z) - Falsification before Extrapolation in Causal Effect Estimation [6.715453431174765]
Causal effects in populations are often estimated using observational datasets.
We propose a meta-algorithm that attempts to reject observational estimates that are biased.
arXiv Detail & Related papers (2022-09-27T21:47:23Z) - Detecting hidden confounding in observational data using multiple
environments [0.81585306387285]
We present a theory for testable conditional independencies that are only absent when there is hidden confounding.
In most cases, the proposed procedure correctly predicts the presence of hidden confounding.
arXiv Detail & Related papers (2022-05-27T12:20:09Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Learning Adjustment Sets from Observational and Limited Experimental
Data [9.028773906859541]
We introduce a method that combines large observational and limited experimental data to identify adjustment sets.
The method successfully identifies adjustment sets and improves causal effect estimation in simulated data.
arXiv Detail & Related papers (2020-05-18T14:23:32Z) - Showing Your Work Doesn't Always Work [73.63200097493576]
"Show Your Work: Improved Reporting of Experimental Results" advocates for reporting the expected validation effectiveness of the best-tuned model.
We analytically show that their estimator is biased and uses error-prone assumptions.
We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation.
arXiv Detail & Related papers (2020-04-28T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.