Related papers: Interpretable Clustering with Adaptive Heterogeneous Causal Structure Learning in Mixed Observational Data

Interpretable Clustering with Adaptive Heterogeneous Causal Structure Learning in Mixed Observational Data

URL: http://arxiv.org/abs/2509.04415v2
Date: Tue, 28 Oct 2025 07:32:34 GMT
Title: Interpretable Clustering with Adaptive Heterogeneous Causal Structure Learning in Mixed Observational Data
Authors: Wenrui Li, Qinghao Zhang, Xiaowo Wang,
Abstract summary: An unsupervised framework, HCL, jointly infers latent clusters and their associated causal structures from mixed-type observational data.<n>It achieves superior performance in both clustering and structure learning tasks, and recovers biologically meaningful mechanisms in real-world single-cell perturbation data.
Score: 6.699689669675078
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding causal heterogeneity is essential for scientific discovery in domains such as biology and medicine. However, existing methods lack causal awareness, with insufficient modeling of heterogeneity, confounding, and observational constraints, leading to poor interpretability and difficulty distinguishing true causal heterogeneity from spurious associations. We propose an unsupervised framework, HCL (Interpretable Causal Mechanism-Aware Clustering with Adaptive Heterogeneous Causal Structure Learning), that jointly infers latent clusters and their associated causal structures from mixed-type observational data without requiring temporal ordering, environment labels, interventions or other prior knowledge. HCL relaxes the homogeneity and sufficiency assumptions by introducing an equivalent representation that encodes both structural heterogeneity and confounding. It further develops a bi-directional iterative strategy to alternately refine causal clustering and structure learning, along with a self-supervised regularization that balance cross-cluster universality and specificity. Together, these components enable convergence toward interpretable, heterogeneous causal patterns. Theoretically, we show identifiability of heterogeneous causal structures under mild conditions. Empirically, HCL achieves superior performance in both clustering and structure learning tasks, and recovers biologically meaningful mechanisms in real-world single-cell perturbation data, demonstrating its utility for discovering interpretable, mechanism-level causal heterogeneity.

Related papers

Sample Complexity of Causal Identification with Temporal Heterogeneity [6.5822033630228916]
We show that temporal structure is shown to effectively substitute for missing environmental diversity.<n>This work shifts the focus from whether causal structure is identifiable to whether it is statistically recoverable in practice.
arXiv Detail & Related papers (2026-02-06T17:44:00Z)
Latent Causal Diffusions for Single-Cell Perturbation Modeling [83.47931153555321]
We present a generative model that frames single-cell gene expression as a stationary diffusion process observed under measurement noise.<n> LCD outperforms established approaches in predicting the distributional shifts of unseen perturbation combinations in single-cell RNA-sequencing screens.<n>We develop an approach we call causal linearization via perturbation responses (CLIPR), which yields an approximation of the direct causal effects between all genes.
arXiv Detail & Related papers (2026-01-20T16:15:38Z)
Linear Causal Representation Learning by Topological Ordering, Pruning, and Disentanglement [12.380741069149956]
Causal representation learning (CRL) has garnered increasing interests from the causal inference and artificial intelligence community.<n>We propose a novel linear CRL algorithm that operates under weaker assumptions about environment heterogeneity and data-generating distributions.<n>We validate our new algorithm via synthetic experiments and an interpretability analysis of large language models.
arXiv Detail & Related papers (2025-09-26T16:35:42Z)
Hybrid Causal Identification and Causal Mechanism Clustering [14.706998903419407]
This paper proposes a Mixture Variational Conditional Causal Inference model (MCVCI) to infer heterogeneous causality.<n>According to the identifiability of the Hybrid Additive Noise Model (HANM), MCVCI combines the superior fitting capabilities of the Gaussian mixture model and the neural network.
arXiv Detail & Related papers (2025-07-29T13:27:15Z)
Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z)
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z)
Algorithmic causal structure emerging through compression [53.52699766206808]
We explore the relationship between causality, symmetry, and compression.<n>We build on and generalize the known connection between learning and compression to a setting where causal models are not identifiable.<n>We define algorithmic causality as an alternative definition of causality when traditional assumptions for causal identifiability do not hold.
arXiv Detail & Related papers (2025-02-06T16:50:57Z)
Persistent Homology for Structural Characterization in Disordered Systems [3.3033726268021315]
We propose a unified framework based on persistent homology (PH) to characterize both local and global structures in disordered systems.<n>It can simultaneously generate local and global descriptors using the same algorithm and data structure.<n>It has shown to be highly effective and interpretable in predicting particle rearrangements and classifying global phases.
arXiv Detail & Related papers (2024-11-21T18:24:06Z)
Hierarchical and Density-based Causal Clustering [6.082022112101251]
We propose plug-in estimators that are simple and readily implementable using off-the-shelf algorithms. We go on to study their rate of convergence, and show that the additional cost of causal clustering is essentially the estimation error of the outcome regression functions.
arXiv Detail & Related papers (2024-11-02T14:01:04Z)
Effect Identification in Cluster Causal Diagrams [51.42809552422494]
We introduce a new type of graphical model called cluster causal diagrams (for short, C-DAGs) C-DAGs allow for the partial specification of relationships among variables based on limited prior knowledge. We develop the foundations and machinery for valid causal inferences over C-DAGs.
arXiv Detail & Related papers (2022-02-22T21:27:31Z)
Supervised Convex Clustering [1.4610038284393165]
We propose and develop a new statistical pattern discovery method named Supervised Convex Clustering ( SCC) SCC borrows strength from both information sources and guides towards finding more interpretable patterns via a joint convex fusion penalty. We demonstrate the practical advantages of SCC through simulations and a case study on Alzheimer's Disease genomics.
arXiv Detail & Related papers (2020-05-25T16:12:38Z)
A Critical View of the Structural Causal Model [89.43277111586258]
We show that one can identify the cause and the effect without considering their interaction at all. We propose a new adversarial training method that mimics the disentangled structure of the causal model. Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.
arXiv Detail & Related papers (2020-02-23T22:52:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.