Controllable Generative Sandbox for Causal Inference
- URL: http://arxiv.org/abs/2603.03587v1
- Date: Tue, 03 Mar 2026 23:37:05 GMT
- Title: Controllable Generative Sandbox for Causal Inference
- Authors: Qi Zhang, Harsh Parikh, Ashley Naimi, Razieh Nabi, Christopher Kim, Timothy Lash,
- Abstract summary: CausalMix is a variational generative framework for causal inference.<n>It achieves state-of-the-art distributional metrics on mixed-type tables while providing stable, fine-grained causal control.<n>We demonstrate practical utility in a comparative safety study of metastatic castration-resistant prostate cancer treatments.
- Score: 9.416664327739516
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Method validation and study design in causal inference rely on synthetic data with known counterfactuals. Existing simulators trade off distributional realism, the ability to capture mixed-type and multimodal tabular data, against causal controllability, including explicit control over overlap, unmeasured confounding, and treatment effect heterogeneity. We introduce CausalMix, a variational generative framework that closes this gap by coupling a mixture of Gaussian latent priors with data-type-specific decoders for continuous, binary, and categorical variables. The model incorporates explicit causal controls: an overlap regularizer shaping propensity-score distributions, alongside direct parameterizations of confounding strength and effect heterogeneity. This unified objective preserves fidelity to the observed data while enabling factorial manipulation of causal mechanisms, allowing overlap, confounding strength, and treatment effect heterogeneity to be varied independently at design time. Across benchmarks, CausalMix achieves state-of-the-art distributional metrics on mixed-type tables while providing stable, fine-grained causal control. We demonstrate practical utility in a comparative safety study of metastatic castration-resistant prostate cancer treatments, using CausalMix to compare estimators under calibrated data-generating processes, tune hyperparameters, and conduct simulation-based power analyses under targeted treatment effect heterogeneity scenarios.
Related papers
- CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data [4.08271266107383]
CausalWrap is a model-agnostic wrapper that injects partial causal knowledge into any pretrained base generator.<n>CW learns a lightweight, differentiable post-hoc correction map applied to samples from the base generator.<n>CW improves causal fidelity across diverse base generators.
arXiv Detail & Related papers (2026-03-02T15:59:46Z) - Causal Graph Learning via Distributional Invariance of Cause-Effect Relationship [54.575090553659074]
We develop an algorithm that efficiently uncovers causal relationships with quadratic complexity in the number of observational variables.<n>Our experiments on a varied benchmark of large-scale datasets show superior or equivalent performance compared to existing works.
arXiv Detail & Related papers (2026-02-03T10:26:16Z) - Detecting Batch Heterogeneity via Likelihood Clustering [0.9668407688201359]
Batch effects represent a major confounder in genomic diagnostics.<n>We introduce a method that addresses both limitations by clustering samples according to their Bayesian model evidence.<n>Our method achieves superior clustering accuracy compared to standard correlation-based and dimensionality-reduction approaches.
arXiv Detail & Related papers (2026-01-14T01:49:21Z) - Hybrid Causal Identification and Causal Mechanism Clustering [14.706998903419407]
This paper proposes a Mixture Variational Conditional Causal Inference model (MCVCI) to infer heterogeneous causality.<n>According to the identifiability of the Hybrid Additive Noise Model (HANM), MCVCI combines the superior fitting capabilities of the Gaussian mixture model and the neural network.
arXiv Detail & Related papers (2025-07-29T13:27:15Z) - Penalized Empirical Likelihood for Doubly Robust Causal Inference under Contamination in High Dimensions [0.720409153108429]
We propose a doubly robust estimator for the average treatment effect in low sample size equations.<n>We show that the proposed confidence interval remain efficient compared to those competing estimates.
arXiv Detail & Related papers (2025-07-23T11:58:54Z) - Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z) - Assumption-Lean Post-Integrated Inference with Surrogate Control Outcomes [6.448728765953916]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using control outcomes.<n>We develop semiparametric inference on projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.<n>The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Breaking the Spurious Causality of Conditional Generation via Fairness
Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset.
This can result in label-conditional distributions that are imbalanced with respect to another latent attribute.
We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.