A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive
Noise Models
- URL: http://arxiv.org/abs/2303.18211v2
- Date: Tue, 31 Oct 2023 22:04:39 GMT
- Title: A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive
Noise Models
- Authors: Alexander G. Reisach, Myriam Tami, Christof Seiler, Antoine Chambaz,
Sebastian Weichwald
- Abstract summary: We show that sorting variables by increasing variance often yields an ordering close to a causal order.
We propose an efficient baseline algorithm termed $R2$-SortnRegress that exploits high $R2$-sortability.
Our findings reveal high $R2$-sortability as an assumption about the data generating process relevant to causal discovery.
- Score: 49.038420266408586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Additive Noise Models (ANMs) are a common model class for causal discovery
from observational data and are often used to generate synthetic data for
causal discovery benchmarking. Specifying an ANM requires choosing all
parameters, including those not fixed by explicit assumptions. Reisach et al.
(2021) show that sorting variables by increasing variance often yields an
ordering close to a causal order and introduce var-sortability to quantify this
alignment. Since increasing variances may be unrealistic and are
scale-dependent, ANM data are often standardized in benchmarks.
We show that synthetic ANM data are characterized by another pattern that is
scale-invariant: the explainable fraction of a variable's variance, as captured
by the coefficient of determination $R^2$, tends to increase along the causal
order. The result is high $R^2$-sortability, meaning that sorting the variables
by increasing $R^2$ yields an ordering close to a causal order. We propose an
efficient baseline algorithm termed $R^2$-SortnRegress that exploits high
$R^2$-sortability and that can match and exceed the performance of established
causal discovery algorithms. We show analytically that sufficiently high edge
weights lead to a relative decrease of the noise contributions along causal
chains, resulting in increasingly deterministic relationships and high $R^2$.
We characterize $R^2$-sortability for different simulation parameters and find
high values in common settings. Our findings reveal high $R^2$-sortability as
an assumption about the data generating process relevant to causal discovery
and implicit in many ANM sampling schemes. It should be made explicit, as its
prevalence in real-world data is unknown. For causal discovery benchmarking, we
implement $R^2$-sortability, the $R^2$-SortnRegress algorithm, and ANM
simulation procedures in our library CausalDisco at
https://causaldisco.github.io/CausalDisco/.
Related papers
- Standardizing Structural Causal Models [80.21199731817698]
We propose internally-standardized structural causal models (iSCMs) for benchmarking algorithms.
By construction, iSCMs are not $operatornameVar$-sortable, and as we show experimentally, not $operatornameR2$-sortable either for commonly-used graph families.
arXiv Detail & Related papers (2024-06-17T14:52:21Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Uncertainty Quantification of MLE for Entity Ranking with Covariates [3.2839905453386162]
This paper concerns with statistical estimation and inference for the ranking problems based on pairwise comparisons.
We propose a novel model, Co-Assisted Ranking Estimation (CARE) model, that extends the well-known Bradley-Terry-Luce (BTL) model.
We derive the maximum likelihood estimator of $alpha_i*_i=1n$ and $beta*$ under a sparse comparison graph.
We validate our theoretical results through large-scale numerical studies and an application to the mutual fund stock holding dataset.
arXiv Detail & Related papers (2022-12-20T02:28:27Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - Diffusion Models for Causal Discovery via Topological Ordering [20.875222263955045]
emphTopological ordering approaches reduce the optimisation space of causal discovery by searching over a permutation rather than graph space.
For ANMs, the emphHessian of the data log-likelihood can be used for finding leaf nodes in a causal graph, allowing its topological ordering.
We introduce theory for updating the learned Hessian without re-training the neural network, and we show that computing with a subset of samples gives an accurate approximation of the ordering.
arXiv Detail & Related papers (2022-10-12T13:36:29Z) - Beware of the Simulated DAG! Varsortability in Additive Noise Models [70.54639814319096]
We show how varsortability dominates the performance of continuous structure learning algorithms on synthetic data.
We aim to raise awareness that varsortability easily occurs in simulated additive noise models.
arXiv Detail & Related papers (2021-02-26T18:52:27Z) - Does generalization performance of $l^q$ regularization learning depend
on $q$? A negative example [19.945160684285003]
$lq$-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling.
We show that all $lq$ estimators for $0 infty$ attain similar generalization error bounds.
This finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact in terms of the generalization capability.
arXiv Detail & Related papers (2013-07-25T00:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.