Unitless Unrestricted Markov-Consistent SCM Generation: Better Benchmark Datasets for Causal Discovery
- URL: http://arxiv.org/abs/2503.17037v1
- Date: Fri, 21 Mar 2025 10:46:50 GMT
- Title: Unitless Unrestricted Markov-Consistent SCM Generation: Better Benchmark Datasets for Causal Discovery
- Authors: Rebecca J. Herman, Jonas Wahl, Urmi Ninad, Jakob Runge,
- Abstract summary: Causal discovery aims to extract qualitative causal knowledge in the form of causal graphs from data.<n>Recent work highlighted certain artifacts of commonly used data generation techniques for a standard class of structural causal models.<n>We analyze which sortability patterns we expect to see in real data, and propose a method for drawing coefficients that we argue more effectively samples the space of SCMs.
- Score: 14.586959818386765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal discovery aims to extract qualitative causal knowledge in the form of causal graphs from data. Because causal ground truth is rarely known in the real world, simulated data plays a vital role in evaluating the performance of the various causal discovery algorithms proposed in the literature. But recent work highlighted certain artifacts of commonly used data generation techniques for a standard class of structural causal models (SCM) that may be nonphysical, including var- and R2-sortability, where the variables' variance and coefficients of determination (R2) after regressing on all other variables, respectively, increase along the causal order. Some causal methods exploit such artifacts, leading to unrealistic expectations for their performance on real-world data. Some modifications have been proposed to remove these artifacts; notably, the internally-standardized structural causal model (iSCM) avoids varsortability and largely alleviates R2-sortability on sparse causal graphs, but exhibits a reversed R2-sortability pattern for denser graphs not featured in their work. We analyze which sortability patterns we expect to see in real data, and propose a method for drawing coefficients that we argue more effectively samples the space of SCMs. Finally, we propose a novel extension of our SCM generation method to the time series setting.
Related papers
- diffIRM: A Diffusion-Augmented Invariant Risk Minimization Framework for Spatiotemporal Prediction over Graphs [6.677219861416146]
Intemporal prediction over graphs (GSTP) is challenging, because real-world data suffers from the Out-of-Distribution (OOD) problem.<n>In this study, we propose a diffusion-augmented invariant risk minimization (diffIRM) framework that combines these two principles.
arXiv Detail & Related papers (2024-12-31T06:45:47Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Standardizing Structural Causal Models [80.21199731817698]
We propose internally-standardized structural causal models (iSCMs) for benchmarking algorithms.<n>By construction, iSCMs are not $operatornameVar$-sortable.<n>We also find empirical evidence that they are mostly not $operatornameR2$-sortable for commonly-used graph families.
arXiv Detail & Related papers (2024-06-17T14:52:21Z) - A Fixed-Point Approach for Causal Generative Modeling [20.88890689294816]
We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables.<n>We establish the weakest known conditions for their unique recovery given the topological ordering (TO)
arXiv Detail & Related papers (2024-04-10T12:29:05Z) - Sample, estimate, aggregate: A recipe for causal discovery foundation models [28.116832159265964]
We train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables.
Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets.
Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift.
arXiv Detail & Related papers (2024-02-02T21:57:58Z) - iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive
Noise Models [48.33685559041322]
This paper focuses on identifying the causal mechanism shifts in two or more related datasets over the same set of variables.
Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/iSCAN.
arXiv Detail & Related papers (2023-06-30T01:48:11Z) - A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive
Noise Models [49.038420266408586]
We show that sorting variables by increasing variance often yields an ordering close to a causal order.
We propose an efficient baseline algorithm termed $R2$-SortnRegress that exploits high $R2$-sortability.
Our findings reveal high $R2$-sortability as an assumption about the data generating process relevant to causal discovery.
arXiv Detail & Related papers (2023-03-31T17:05:46Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Learning Latent Structural Causal Models [31.686049664958457]
In machine learning tasks, one often operates on low-level data like image pixels or high-dimensional vectors.
We present a tractable approximate inference method which performs joint inference over the causal variables, structure and parameters of the latent Structural Causal Model.
arXiv Detail & Related papers (2022-10-24T20:09:44Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Identification of Latent Variables From Graphical Model Residuals [0.0]
We present a novel method to control for the latent space when estimating a DAG by iteratively deriving proxies for the latent space from the residuals of the inferred model.
We show that any improvement of prediction of an outcome is intrinsically capped and cannot rise beyond a certain limit as compared to the confounded model.
arXiv Detail & Related papers (2021-01-07T02:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.