Distributional Treatment Effect Estimation across Heterogeneous Sites via Optimal Transport
- URL: http://arxiv.org/abs/2511.09759v1
- Date: Fri, 14 Nov 2025 01:08:12 GMT
- Title: Distributional Treatment Effect Estimation across Heterogeneous Sites via Optimal Transport
- Authors: Borna Bateni, Yubai Yuan, Qi Xu, Annie Qu,
- Abstract summary: We propose a novel framework for synthesizing counterfactual treatment group data in a target site.<n>Our approach adopts a distributional causal inference perspective by modeling treatment and control as distinct probability measures on the source and target sites.
- Score: 23.093484580587074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel framework for synthesizing counterfactual treatment group data in a target site by integrating full treatment and control group data from a source site with control group data from the target. Departing from conventional average treatment effect estimation, our approach adopts a distributional causal inference perspective by modeling treatment and control as distinct probability measures on the source and target sites. We formalize the cross-site heterogeneity (effect modification) as a push-forward transformation that maps the joint feature-outcome distribution from the source to the target site. This transformation is learned by aligning the control group distributions between sites using an Optimal Transport-based procedure, and subsequently applied to the source treatment group to generate the synthetic target treatment distribution. Under general regularity conditions, we establish theoretical guarantees for the consistency and asymptotic convergence of the synthetic treatment group data to the true target distribution. Simulation studies across multiple data-generating scenarios and a real-world application to patient-derived xenograft data demonstrate that our framework robustly recovers the full distributional properties of treatment effects.
Related papers
- Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z) - Beyond the Average: Distributional Causal Inference under Imperfect Compliance [8.76134221825298]
We study the estimation of distributional treatment effects in randomized experiments with imperfect compliance.<n>We propose a regression-adjusted estimator based on a distribution regression framework with Neyman-orthogonal moment conditions.<n>We demonstrate the method's practical relevance in an application to the Oregon Health Insurance Experiment.
arXiv Detail & Related papers (2025-09-19T04:53:42Z) - On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization [6.324765782436764]
We propose a flexible distribution regression framework that leverages off-the-shelf machine learning methods.<n>We establish the distribution of the proposed estimator and introduce a valid inference procedure.
arXiv Detail & Related papers (2025-06-06T10:14:38Z) - Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective [61.284843894545475]
Complex algorithms for treatment effect estimation are ineffective when handling insufficiently labeled training sets.<n>We propose FCCM, which transforms the optimization objective into the textitFactual and textitCounterfactual Coverage Maximization to ensure effective radius reduction during data acquisition.<n> benchmarking FCCM against other baselines demonstrates its superiority across both fully synthetic and semi-synthetic datasets.
arXiv Detail & Related papers (2025-05-08T13:42:00Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers.<n>We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.<n>This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift [9.387706860375461]
A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance.
The prediction interval serves as a crucial tool for characterizing uncertainties induced by their underlying distribution.
We propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain.
arXiv Detail & Related papers (2024-05-16T17:55:42Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Optimal Transport-Guided Conditional Score-Based Diffusion Models [63.14903268958398]
Conditional score-based diffusion model (SBDM) is for conditional generation of target data with paired data as condition, and has achieved great success in image translation.
To tackle the applications with partially paired or even unpaired dataset, we propose a novel Optimal Transport-guided Conditional Score-based diffusion model (OTCS) in this paper.
arXiv Detail & Related papers (2023-11-02T13:28:44Z) - Constructing Synthetic Treatment Groups without the Mean Exchangeability
Assumption [32.849140378576095]
We construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations.
We establish the normality of the synthetic treatment group based on the sieve semiparametric theory.
arXiv Detail & Related papers (2023-09-28T13:00:56Z) - Federated Causal Inference in Heterogeneous Observational Data [13.460660554484512]
We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site.
Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms.
Motivated by these considerations, we develop federated methods to draw inference on the average treatment effects of combined data across sites.
arXiv Detail & Related papers (2021-07-25T05:55:00Z) - Which Invariance Should We Transfer? A Causal Minimax Learning Approach [18.71316951734806]
We present a comprehensive minimax analysis from a causal perspective.
We propose an efficient algorithm to search for the subset with minimal worst-case risk.
The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease.
arXiv Detail & Related papers (2021-07-05T09:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.