Related papers: Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty

Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty

URL: http://arxiv.org/abs/2512.18083v1
Date: Fri, 19 Dec 2025 21:40:46 GMT
Title: Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty
Authors: Ashley Zhang,
Abstract summary: We reframing ATE estimation as a textitdomain adaptation problem under distribution shift.<n>We propose the textbfJoint Robust Estimator (JRE) to train outcome models jointly.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard approaches to causal inference, such as Outcome Regression and Inverse Probability Weighted Regression Adjustment (IPWRA), are typically derived through the lens of missing data imputation and identification theory. In this work, we unify these methods from a Machine Learning perspective, reframing ATE estimation as a \textit{domain adaptation problem under distribution shift}. We demonstrate that the canonical Hajek estimator is a special case of IPWRA restricted to a constant hypothesis class, and that IPWRA itself is fundamentally Importance-Weighted Empirical Risk Minimization designed to correct for the covariate shift between the treated sub-population and the target population. Leveraging this unified framework, we critically examine the optimization objectives of Doubly Robust estimators. We argue that standard methods enforce \textit{sufficient but not necessary} conditions for consistency by requiring outcome models to be individually unbiased. We define the true "ATE Risk Function" and show that minimizing it requires only that the biases of the treated and control models structurally cancel out. Exploiting this insight, we propose the \textbf{Joint Robust Estimator (JRE)}. Instead of treating propensity estimation and outcome modeling as independent stages, JRE utilizes bootstrap-based uncertainty quantification of the propensity score to train outcome models jointly. By optimizing for the expected ATE risk over the distribution of propensity scores, JRE leverages model degrees of freedom to achieve robustness against propensity misspecification. Simulation studies demonstrate that JRE achieves up to a 15\% reduction in MSE compared to standard IPWRA in finite-sample regimes with misspecified outcome models.

Related papers

Observationally Informed Adaptive Causal Experimental Design [55.998153710215654]
We propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior.<n>This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias.<n> Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines.
arXiv Detail & Related papers (2026-03-04T06:52:37Z)
Nonparametric Distribution Regression Re-calibration [3.0204520109309847]
Minimizing overall prediction error encourages models to prioritize informativeness over calibration.<n>In safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow intervals.<n>We propose a novel non-parametric re-calibration algorithm based on conditional kernel mean embeddings.
arXiv Detail & Related papers (2026-02-13T11:48:43Z)
The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z)
Uncertainty Quantification for Regression using Proper Scoring Rules [76.24649098854219]
We introduce a unified UQ framework for regression based on proper scoring rules, such as CRPS, logarithmic, squared error, and quadratic scores.<n>We derive closed-form expressions for the uncertainty measures under practical parametric assumptions and show how to estimate them using ensembles of models.<n>Our broad evaluation on synthetic and real-world regression datasets provides guidance for selecting reliable UQ measures.
arXiv Detail & Related papers (2025-09-30T17:52:12Z)
Regularizing Extrapolation in Causal Inference [12.057981453189505]
We propose a unified framework that directly penalizes the level of extrapolation.<n>We derive a worst-case extrapolation error bound and introduce a novel "bias-bias-variance" tradeoff.
arXiv Detail & Related papers (2025-09-21T18:05:15Z)
RDIT: Residual-based Diffusion Implicit Models for Probabilistic Time Series Forecasting [4.140149411004857]
RDIT is a plug-and-play framework that combines point estimation and residual-based conditional diffusion with a bidirectional Mamba network.<n>We show that RDIT achieves lower CRPS, rapid inference, and improved coverage compared to strong baselines.
arXiv Detail & Related papers (2025-09-02T14:06:29Z)
Principled Input-Output-Conditioned Post-Hoc Uncertainty Estimation for Regression Networks [1.4671424999873808]
Uncertainty is critical in safety-sensitive applications but is often omitted from off-the-shelf neural networks due to adverse effects on predictive performance.<n>We propose a theoretically grounded framework for post-hoc uncertainty estimation in regression tasks by fitting an auxiliary model to both original inputs and frozen model outputs.
arXiv Detail & Related papers (2025-06-01T09:13:27Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Distributionally Robust Instrumental Variables Estimation [10.765695227417865]
We show that Wasserstein DRIVE is a distributionally robust IV estimation method.<n>We derive the distribution of Wasserstein DRIVE and propose data-driven procedures to select the regularization parameter.
arXiv Detail & Related papers (2024-10-21T04:33:38Z)
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework for Large Language Models (LLMs)<n> Namely, we propose novel metrics with high probability guarantees concerning the output distribution of a model.<n>Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z)
High Precision Causal Model Evaluation with Conditional Randomization [10.23470075454725]
We introduce a novel low-variance estimator for causal error, dubbed as the pairs estimator. By applying the same IPW estimator to both the model and true experimental effects, our estimator effectively cancels out the variance due to IPW and achieves a smaller variance. Our method offers a simple yet powerful solution to evaluate causal inference models in conditional randomization settings without complicated modification of the IPW estimator itself.
arXiv Detail & Related papers (2023-11-03T13:22:27Z)
On the Variance, Admissibility, and Stability of Empirical Risk Minimization [57.63331017830154]
Empirical Risk Minimization (ERM) may attain minimax suboptimal rates in terms of the mean squared error.<n>We prove that under relatively mild assumptions, the suboptimality of ERM must be due to its large bias.
arXiv Detail & Related papers (2023-05-29T15:25:48Z)
The Decaying Missing-at-Random Framework: Model Doubly Robust Causal Inference with Partially Labeled Data [8.916614661563893]
We introduce a missing-at-random (decaying MAR) framework and associated approaches for doubly robust causal inference.<n>This simultaneously addresses selection bias in the labeling mechanism and the extreme imbalance between labeled and unlabeled groups.<n>To ensure robust causal conclusions, we propose a bias-reduced SS estimator for the average treatment effect.
arXiv Detail & Related papers (2023-05-22T07:37:12Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.