Related papers: Rethinking Causal Discovery Through the Lens of Exchangeability

Rethinking Causal Discovery Through the Lens of Exchangeability

URL: http://arxiv.org/abs/2512.10152v1
Date: Wed, 10 Dec 2025 23:19:39 GMT
Title: Rethinking Causal Discovery Through the Lens of Exchangeability
Authors: Tiago Brogueira, Mário Figueiredo,
Abstract summary: Causal discovery methods have traditionally been developed under two distinct regimes.<n>We argue that the i.i.d. setting can and should be reframed in terms of exchangeability.<n>We show that our exchangeable synthetic dataset mirrors the statistical structure of the real-world "i.i.d." dataset more closely than all other i.i.d. synthetic datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Causal discovery methods have traditionally been developed under two distinct regimes: independent and identically distributed (i.i.d.) and timeseries data, each governed by separate modelling assumptions. In this paper, we argue that the i.i.d. setting can and should be reframed in terms of exchangeability, a strictly more general symmetry principle. We present the implications of this reframing, alongside two core arguments: (1) a conceptual argument, based on extending the dependency of experimental causal inference on exchangeability to causal discovery; and (2) an empirical argument, showing that many existing i.i.d. causal-discovery methods are predicated on exchangeability assumptions, and that the sole extensive widely-used real-world "i.i.d." benchmark (the Tübingen dataset) consists mainly of exchangeable (and not i.i.d.) examples. Building on this insight, we introduce a novel synthetic dataset that enforces only the exchangeability assumption, without imposing the stronger i.i.d. assumption. We show that our exchangeable synthetic dataset mirrors the statistical structure of the real-world "i.i.d." dataset more closely than all other i.i.d. synthetic datasets. Furthermore, we demonstrate the predictive capability of this dataset by proposing a neural-network-based causal-discovery algorithm trained exclusively on our synthetic dataset, and which performs similarly to other state-of-the-art i.i.d. methods on the real-world benchmark.

Related papers

Differentiable Cyclic Causal Discovery Under Unmeasured Confounders [11.594415886406553]
DCCD-CONF is a novel framework for differentiable learning of nonlinear cyclic causal graphs in the presence of unmeasured confounders.<n>We show that DCCD-CONF outperforms state-of-the-art methods in both causal graph recovery and confounder identification.
arXiv Detail & Related papers (2025-08-11T20:13:34Z)
Synthetic Tabular Data Validation: A Divergence-Based Approach [8.062368743143388]
Divergences quantify discrepancies between data distributions. Traditional approaches calculate divergences independently for each feature. We propose a novel approach that uses divergence estimation to overcome the limitations of marginal comparisons.
arXiv Detail & Related papers (2024-05-13T15:07:52Z)
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z)
Causal Discovery by Kernel Deviance Measures with Heterogeneous Transforms [17.368146833023893]
We propose a novel score measure based on heterogeneous transformation of RKHS embeddings to extract relevant higher-order moments of the conditional densities for causal discovery. Inference is made via comparing the score of each hypothetical cause-effect direction.
arXiv Detail & Related papers (2024-01-31T17:28:05Z)
Identifiable Latent Polynomial Causal Models Through the Lens of Change [82.14087963690561]
Causal representation learning aims to unveil latent high-level causal representations from observed low-level data.<n>One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability.
arXiv Detail & Related papers (2023-10-24T07:46:10Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z)
Towards Causal Representation Learning and Deconfounding from Indefinite Data [17.793702165499298]
Non-statistical data (e.g., images, text, etc.) encounters significant conflicts in terms of properties and methods with traditional causal data. We redefine causal data from two novel perspectives and then propose three data paradigms. We implement the above designs as a dynamic variational inference model, tailored to learn causal representation from indefinite data.
arXiv Detail & Related papers (2023-05-04T08:20:37Z)
Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis [7.895866278697778]
Machine learning approaches commonly rely on the assumption of independent and identically distributed (i.i.d.) data. In reality, this assumption is almost always violated due to distribution shifts between environments. We propose the Mechanism Shift Score (MSS), a score-based approach amenable to various empirical estimators.
arXiv Detail & Related papers (2022-06-04T15:39:30Z)
A Critical View of the Structural Causal Model [89.43277111586258]
We show that one can identify the cause and the effect without considering their interaction at all. We propose a new adversarial training method that mimics the disentangled structure of the causal model. Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.
arXiv Detail & Related papers (2020-02-23T22:52:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.