Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety
- URL: http://arxiv.org/abs/2509.20586v1
- Date: Wed, 24 Sep 2025 21:55:54 GMT
- Title: Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety
- Authors: Chi-Shian Dai, Chao Ying, Yang Ning, Jiwei Zhao,
- Abstract summary: We address scenarios where external control data, often with a much larger sample size, are available.<n>We find that incorporating external controls into the standard doubly robust estimator for ATT may paradoxically result in reduced efficiency.<n>We propose a novel doubly robust estimator that guarantees higher efficiency than the standard approach.
- Score: 5.102311052155508
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomized controlled trials (RCTs) are widely regarded as the gold standard for causal inference in biomedical research. For instance, when estimating the average treatment effect on the treated (ATT), a doubly robust estimation procedure can be applied, requiring either the propensity score model or the control outcome model to be correctly specified. In this paper, we address scenarios where external control data, often with a much larger sample size, are available. Such data are typically easier to obtain from historical records or third-party sources. However, we find that incorporating external controls into the standard doubly robust estimator for ATT may paradoxically result in reduced efficiency compared to using the estimator without external controls. This counterintuitive outcome suggests that the naive incorporation of external controls could be detrimental to estimation efficiency. To resolve this issue, we propose a novel doubly robust estimator that guarantees higher efficiency than the standard approach without external controls, even under model misspecification. When all models are correctly specified, this estimator aligns with the standard doubly robust estimator that incorporates external controls and achieves semiparametric efficiency. The asymptotic theory developed in this work applies to high-dimensional confounder settings, which are increasingly common with the growing prevalence of electronic health record data. We demonstrate the effectiveness of our methodology through extensive simulation studies and a real-world data application.
Related papers
- Observationally Informed Adaptive Causal Experimental Design [55.998153710215654]
We propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior.<n>This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias.<n> Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines.
arXiv Detail & Related papers (2026-03-04T06:52:37Z) - Measuring Model Performance in the Presence of an Intervention [11.381587523287495]
In many AI for social impact applications, the presence of an intervention that affects the outcome can bias the evaluation.<n>RCTs randomly assign interventions, allowing data from the control group to be used for unbiased model evaluation.<n>We propose nuisance parameter weighting (NPW), an unbiased model evaluation approach that reweights data from the treatment group to mimic the distributions of samples that would or would not experience the outcome.
arXiv Detail & Related papers (2025-11-08T02:24:16Z) - Distributionally Robust Optimization with Adversarial Data Contamination [49.89480853499918]
We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions.<n>Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts.<n>This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
arXiv Detail & Related papers (2025-07-14T18:34:10Z) - Value-Based Deep RL Scales Predictably [100.21834069400023]
We show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior.<n>We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym.
arXiv Detail & Related papers (2025-02-06T18:59:47Z) - Assumption-Lean Post-Integrated Inference with Surrogate Control Outcomes [6.448728765953916]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using control outcomes.<n>We develop semiparametric inference on projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.<n>The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples.<n>Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z) - Adaptive-TMLE for the Average Treatment Effect based on Randomized Controlled Trial Augmented with Real-World Data [0.0]
We consider the problem of estimating the average treatment effect (ATE) when both randomized control trial (RCT) data and external real-world data (RWD) are available.<n>We introduce an adaptive targeted maximum likelihood estimation framework to estimate them.
arXiv Detail & Related papers (2024-05-12T07:10:26Z) - Efficient adjustment for complex covariates: Gaining efficiency with
DOPE [56.537164957672715]
We propose a framework that accommodates adjustment for any subset of information expressed by the covariates.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estorimator (DOPE) for efficient estimation of the average treatment effect (ATE)
Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
arXiv Detail & Related papers (2024-02-20T13:02:51Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - A Causal Inference Framework for Leveraging External Controls in Hybrid
Trials [1.7942265700058988]
We consider the challenges associated with causal inference in settings where data from a randomized trial is augmented with control data from an external source.
We propose estimators, review efficiency bounds, and an approach for efficient doubly-robust estimation.
We apply the framework to a trial investigating the effect of risdisplam on motor function in patients with spinal muscular atrophy.
arXiv Detail & Related papers (2023-05-15T19:15:32Z) - A General Framework for Treatment Effect Estimation in Semi-Supervised and High Dimensional Settings [0.0]
We develop a family of SS estimators which are more robust and (2) more efficient than their supervised counterparts.
We further establish root-n consistency and normality of our SS estimators whenever the propensity score in the model is correctly specified.
Our estimators are shown to be semi-parametrically efficient as long as all the nuisance functions are correctly specified.
arXiv Detail & Related papers (2022-01-03T04:12:44Z) - Decomposed Adversarial Learned Inference [118.27187231452852]
We propose a novel approach, Decomposed Adversarial Learned Inference (DALI)
DALI explicitly matches prior and conditional distributions in both data and code spaces.
We validate the effectiveness of DALI on the MNIST, CIFAR-10, and CelebA datasets.
arXiv Detail & Related papers (2020-04-21T20:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.