Testing Effect Homogeneity and Confounding in High-Dimensional Experimental and Observational Studies
- URL: http://arxiv.org/abs/2602.19703v1
- Date: Mon, 23 Feb 2026 10:52:39 GMT
- Title: Testing Effect Homogeneity and Confounding in High-Dimensional Experimental and Observational Studies
- Authors: Ana Armendariz, Martin Huber,
- Abstract summary: We propose a framework for testing the homogeneity of conditional average treatment effects (CATEs) across multiple experimental and observational studies.<n>Our approach leverages multiple randomized trials to assess whether treatment effects vary with unobserved heterogeneity that differs across trials.
- Score: 0.552480439325792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a framework for testing the homogeneity of conditional average treatment effects (CATEs) across multiple experimental and observational studies. Our approach leverages multiple randomized trials to assess whether treatment effects vary with unobserved heterogeneity that differs across trials: if CATEs are homogeneous, this indicates the absence of interactions between treatment and unobservables in the mean effect. Comparing CATEs between experimental and observational data further allows evaluation of potential confounding: if the estimands coincide, there is no unobserved confounding; if they differ, deviations may arise from unobserved confounding, effect heterogeneity, or both. We extend the framework to settings with alternative identification strategies, namely instrumental variable settings and panel data with parallel trends assumptions based on differences in differences, where effects are identified only locally for subpopulations such as compliers or treated units. In these contexts, testing homogeneity is useful for assessing whether local effects can be extrapolated to the total population. We suggest a test based on double machine learning that accommodates high-dimensional covariates in a data-driven way and investigate its finite-sample performance through a simulation study. Finally, we apply the test to the International Stroke Trial (IST), a large multi-country randomized controlled trial in patients with acute ischaemic stroke that evaluated whether early treatment with aspirin altered subsequent clinical outcomes. Our methodology provides a flexible tool for both validating identification assumptions and understanding the generalizability of estimated treatment effects.
Related papers
- A Causal Machine Learning Framework for Treatment Personalization in Clinical Trials: Application to Ulcerative Colitis [0.7799711162530713]
We present a modular causal machine learning framework that evaluates each question separately.<n>We apply this framework to patient-level data from the UNIFI maintenance trial of ustekinumab in ulcerative colitis.
arXiv Detail & Related papers (2026-02-09T00:26:30Z) - A Bayesian Classification Trees Approach to Treatment Effect Variation with Noncompliance [0.5356944479760104]
Estimating varying treatment effects in randomized trials with noncompliance is inherently challenging.
Existing flexible machine learning methods are highly sensitive to the weak instruments problem.
We present a Bayesian Causal Forest model for binary response variables in scenarios with noncompliance.
arXiv Detail & Related papers (2024-08-14T18:33:55Z) - Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments [67.80453452949303]
Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine.
Here, we focus on the widespread setting where the observational data come from multiple environments.
We propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models.
arXiv Detail & Related papers (2024-06-04T16:31:43Z) - Identification of Single-Treatment Effects in Factorial Experiments [0.0]
I show that when multiple interventions are randomized in experiments, the effect any single intervention would have outside the experimental setting is not identified absent heroic assumptions.
observational studies and factorial experiments provide information about potential-outcome distributions with zero and multiple interventions.
I show that researchers who rely on this type of design have to justify either linearity of functional forms or specify with Directed Acyclic Graphs how variables are related in the real world.
arXiv Detail & Related papers (2024-05-16T04:01:53Z) - Estimating treatment effects from single-arm trials via latent-variable
modeling [14.083487062917085]
Single-arm trials, where all patients belong to the treatment group, can be a viable alternative but require access to an external control group.
We propose an identifiable deep latent-variable model for this scenario.
Our results show improved performance both for direct treatment effect estimation as well as for effect estimation via patient matching.
arXiv Detail & Related papers (2023-11-06T10:12:54Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [58.05402364136958]
We propose a double machine learning approach to combine experimental and observational studies.<n>Our framework proposes a falsification test for external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - Comparison of Methods that Combine Multiple Randomized Trials to
Estimate Heterogeneous Treatment Effects [0.1398098625978622]
Leveraging multiple randomized controlled trials allows for the combination of datasets with unconfounded treatment assignment.
This paper discusses several non-parametric approaches for estimating heterogeneous treatment effects using data from multiple trials.
arXiv Detail & Related papers (2023-03-28T20:43:00Z) - SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event
Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data.
We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z) - On Inductive Biases for Heterogeneous Treatment Effect Estimation [91.3755431537592]
We investigate how to exploit structural similarities of an individual's potential outcomes (POs) under different treatments.
We compare three end-to-end learning strategies to overcome this problem.
arXiv Detail & Related papers (2021-06-07T16:30:46Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Generalization Bounds and Representation Learning for Estimation of
Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication.
We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance.
We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.