Observationally Informed Adaptive Causal Experimental Design
- URL: http://arxiv.org/abs/2603.03785v1
- Date: Wed, 04 Mar 2026 06:52:37 GMT
- Title: Observationally Informed Adaptive Causal Experimental Design
- Authors: Erdun Gao, Liang Zhang, Jake Fawkes, Aoqi Zuo, Wenqin Liu, Haoxuan Li, Mingming Gong, Dino Sejdinovic,
- Abstract summary: We propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior.<n>This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias.<n> Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines.
- Score: 55.998153710215654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data is often available, it is utilized only for retrospective fusion, and remains discarded in prospective trial design due to bias concerns. We argue this "tabula rasa" data acquisition strategy is fundamentally inefficient. In this work, we propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior. This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias. To operationalize this, we introduce the R-Design framework. Theoretically, we establish two key advantages: (1) a structural efficiency gap, proving that estimating smooth residual contrasts admits strictly faster convergence rates than reconstructing full outcomes; and (2) information efficiency, where we quantify the redundancy in standard parameter-based acquisition (e.g., BALD), demonstrating that such baselines waste budget on task-irrelevant nuisance uncertainty. We propose R-EPIG (Residual Expected Predictive Information Gain), a unified criterion that directly targets the causal estimand, minimizing residual uncertainty for estimation or clarifying decision boundaries for policy. Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines, confirming that repairing a biased model is far more efficient than learning one from scratch.
Related papers
- Towards Anytime-Valid Statistical Watermarking [63.02116925616554]
We develop the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference.<n>Our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-19T18:32:26Z) - Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty [0.0]
We reframing ATE estimation as a textitdomain adaptation problem under distribution shift.<n>We propose the textbfJoint Robust Estimator (JRE) to train outcome models jointly.
arXiv Detail & Related papers (2025-12-19T21:40:46Z) - Bayesian Semiparametric Causal Inference: Targeted Doubly Robust Estimation of Treatment Effects [1.2833734915643464]
We propose a semiparametric Bayesian methodology for estimating the average treatment effect (ATE) within the potential outcomes framework.<n>Our method introduces a Bayesian debiasing procedure that corrects for bias arising from nuisance estimation.<n>Extensive simulations confirm the theoretical results, demonstrating accurate point estimation and credible intervals with nominal coverage.
arXiv Detail & Related papers (2025-11-19T22:15:04Z) - A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning [31.861874030715953]
We provide the first theoretical framework for analyzing sampling-based test-time scaling methods.<n>We introduce RPC, a hybrid method that leverages our theoretical insights through two key components: Perplexity Consistency and Reasoning Pruning.<n>RPC achieves reasoning performance comparable to self-consistency while not only enhancing confidence reliability but also reducing sampling costs by 50%.
arXiv Detail & Related papers (2025-10-17T08:59:30Z) - Causal-EPIG: A Prediction-Oriented Active Learning Framework for CATE Estimation [18.560880912660885]
We introduce the principle of causal objective alignment, which posits that acquisition functions should target unobservable causal quantities.<n>We operationalize this principle through the Causal-EPIG framework.<n>We derive two distinct strategies that embody a fundamental trade-off.
arXiv Detail & Related papers (2025-09-26T04:44:27Z) - Lost at the Beginning of Reasoning [85.17612793300238]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.
arXiv Detail & Related papers (2025-06-27T09:53:57Z) - Principled Input-Output-Conditioned Post-Hoc Uncertainty Estimation for Regression Networks [1.4671424999873808]
Uncertainty is critical in safety-sensitive applications but is often omitted from off-the-shelf neural networks due to adverse effects on predictive performance.<n>We propose a theoretically grounded framework for post-hoc uncertainty estimation in regression tasks by fitting an auxiliary model to both original inputs and frozen model outputs.
arXiv Detail & Related papers (2025-06-01T09:13:27Z) - A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation [55.53426007439564]
Estimating individualized treatment effects from observational data is a central challenge in causal inference.<n>In inverse probability weighting (IPW) is a well-established solution to this problem, but its integration into modern deep learning frameworks remains limited.<n>We propose Importance-Weighted Diffusion Distillation (IWDD), a novel generative framework that combines the pretraining of diffusion models with importance-weighted score distillation.
arXiv Detail & Related papers (2025-05-16T17:00:52Z) - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning [53.25336975467293]
We present the first theoretical error decomposition analysis of methods such as perplexity and self-consistency.<n>Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function.<n>We propose Reasoning-Pruning Perplexity Consistency (RPC), which integrates perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths.
arXiv Detail & Related papers (2025-02-01T18:09:49Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.