Large-Scale Differentiable Causal Discovery of Factor Graphs
- URL: http://arxiv.org/abs/2206.07824v1
- Date: Wed, 15 Jun 2022 21:28:36 GMT
- Title: Large-Scale Differentiable Causal Discovery of Factor Graphs
- Authors: Romain Lopez, Jan-Christian H\"utter, Jonathan K. Pritchard, Aviv
Regev
- Abstract summary: We introduce the notion of factor directed acyclic graphs (f-DAGs) as a way to the search space to non-linear low-rank causal interaction models.
We propose a scalable implementation of f-DAG constrained causal discovery for high-dimensional interventional data.
- Score: 3.8015092217142223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common theme in causal inference is learning causal relationships between
observed variables, also known as causal discovery. This is usually a daunting
task, given the large number of candidate causal graphs and the combinatorial
nature of the search space. Perhaps for this reason, most research has so far
focused on relatively small causal graphs, with up to hundreds of nodes.
However, recent advances in fields like biology enable generating experimental
data sets with thousands of interventions followed by rich profiling of
thousands of variables, raising the opportunity and urgent need for large
causal graph models. Here, we introduce the notion of factor directed acyclic
graphs (f-DAGs) as a way to restrict the search space to non-linear low-rank
causal interaction models. Combining this novel structural assumption with
recent advances that bridge the gap between causal discovery and continuous
optimization, we achieve causal discovery on thousands of variables.
Additionally, as a model for the impact of statistical noise on this estimation
procedure, we study a model of edge perturbations of the f-DAG skeleton based
on random graphs and quantify the effect of such perturbations on the f-DAG
rank. This theoretical analysis suggests that the set of candidate f-DAGs is
much smaller than the whole DAG space and thus more statistically robust in the
high-dimensional regime where the underlying skeleton is hard to assess. We
propose Differentiable Causal Discovery of Factor Graphs (DCD-FG), a scalable
implementation of f-DAG constrained causal discovery for high-dimensional
interventional data. DCD-FG uses a Gaussian non-linear low-rank structural
equation model and shows significant improvements compared to state-of-the-art
methods in both simulations as well as a recent large-scale single-cell RNA
sequencing data set with hundreds of genetic interventions.
Related papers
- Predicting perturbation targets with causal differential networks [23.568795598997376]
We use an amortized causal discovery model to infer causal graphs from the observational and interventional datasets.
We learn to map these paired graphs to the sets of variables that were intervened upon, in a supervised learning framework.
This approach consistently outperforms baselines for perturbation modeling on seven single-cell transcriptomics datasets.
arXiv Detail & Related papers (2024-10-04T12:48:21Z) - Adaptive Online Experimental Design for Causal Discovery [9.447864414136905]
Causal discovery aims to uncover cause-and-effect relationships encoded in causal graphs.
We focus on data interventional efficiency and formalize causal discovery from the perspective of online learning.
We propose a track-and-stop causal discovery algorithm that adaptively selects interventions from the graph separating system.
arXiv Detail & Related papers (2024-05-19T13:26:33Z) - Sample, estimate, aggregate: A recipe for causal discovery foundation models [28.116832159265964]
We train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables.
Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets.
Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift.
arXiv Detail & Related papers (2024-02-02T21:57:58Z) - Learning Latent Structural Causal Models [31.686049664958457]
In machine learning tasks, one often operates on low-level data like image pixels or high-dimensional vectors.
We present a tractable approximate inference method which performs joint inference over the causal variables, structure and parameters of the latent Structural Causal Model.
arXiv Detail & Related papers (2022-10-24T20:09:44Z) - Effect Identification in Cluster Causal Diagrams [51.42809552422494]
We introduce a new type of graphical model called cluster causal diagrams (for short, C-DAGs)
C-DAGs allow for the partial specification of relationships among variables based on limited prior knowledge.
We develop the foundations and machinery for valid causal inferences over C-DAGs.
arXiv Detail & Related papers (2022-02-22T21:27:31Z) - BCDAG: An R package for Bayesian structure and Causal learning of
Gaussian DAGs [77.34726150561087]
We introduce the R package for causal discovery and causal effect estimation from observational data.
Our implementation scales efficiently with the number of observations and, whenever the DAGs are sufficiently sparse, the number of variables in the dataset.
We then illustrate the main functions and algorithms on both real and simulated datasets.
arXiv Detail & Related papers (2022-01-28T09:30:32Z) - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Variational Causal Networks: Approximate Bayesian Inference over Causal
Structures [132.74509389517203]
We introduce a parametric variational family modelled by an autoregressive distribution over the space of discrete DAGs.
In experiments, we demonstrate that the proposed variational posterior is able to provide a good approximation of the true posterior.
arXiv Detail & Related papers (2021-06-14T17:52:49Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.