BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
- URL: http://arxiv.org/abs/2112.02761v1
- Date: Mon, 6 Dec 2021 03:35:21 GMT
- Title: BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
- Authors: Chris Cundy and Aditya Grover and Stefano Ermon
- Abstract summary: A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
- Score: 97.79015388276483
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A structural equation model (SEM) is an effective framework to reason over
causal relationships represented via a directed acyclic graph (DAG). Recent
advances have enabled effective maximum-likelihood point estimation of DAGs
from observational data. However, a point estimate may not accurately capture
the uncertainty in inferring the underlying graph in practical scenarios,
wherein the true DAG is non-identifiable and/or the observed dataset is
limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational
inference framework for estimating a distribution over DAGs characterizing a
linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is
challenging due to the the discrete and combinatorial nature of graphs. We
analyse key design choices for scalable VI over DAGs, such as 1) the
parametrization of DAGs via an expressive variational family, 2) a continuous
relaxation that enables low-variance stochastic optimization, and 3) suitable
priors over the latent variables. We provide a series of experiments on real
and synthetic data showing that BCD Nets outperform maximum-likelihood methods
on standard causal discovery metrics such as structural Hamming distance in low
data regimes.
Related papers
- Scalable Variational Causal Discovery Unconstrained by Acyclicity [6.954510776782872]
We propose a scalable Bayesian approach to learn the posterior distribution over causal graphs given observational data.
We introduce a novel differentiable DAG sampling method that can generate a valid acyclic causal graph.
We are able to model the posterior distribution over causal graphs using a simple variational distribution over a continuous domain.
arXiv Detail & Related papers (2024-07-06T07:56:23Z) - ProDAG: Projection-Induced Variational Inference for Directed Acyclic Graphs [8.556906995059324]
Directed acyclic graph (DAG) learning is a rapidly expanding field of research.
It remains statistically and computationally challenging to learn a single (point estimate) DAG from data, let alone provide uncertainty quantification.
Our article addresses the difficult task of quantifying graph uncertainty by developing a Bayesian variational inference framework based on novel distributions that have support directly on the space of DAGs.
arXiv Detail & Related papers (2024-05-24T03:04:28Z) - BayesDAG: Gradient-Based Posterior Inference for Causal Discovery [30.027520859604955]
We introduce a scalable causal discovery framework based on a combination of Markov Chain Monte Carlo and Variational Inference.
Our approach directly samples DAGs from the posterior without requiring any DAG regularization.
We derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed estimator defined over permutations.
arXiv Detail & Related papers (2023-07-26T02:34:13Z) - Discovering Dynamic Causal Space for DAG Structure Learning [64.763763417533]
We propose a dynamic causal space for DAG structure learning, coined CASPER.
It integrates the graph structure into the score function as a new measure in the causal space to faithfully reflect the causal distance between estimated and ground truth DAG.
arXiv Detail & Related papers (2023-06-05T12:20:40Z) - Causal Graph Discovery from Self and Mutually Exciting Time Series [10.410454851418548]
We develop a non-asymptotic recovery guarantee and quantifiable uncertainty by solving a linear program.
We demonstrate the effectiveness of our approach in recovering highly interpretable causal DAGs over Sepsis Associated Derangements (SADs)
arXiv Detail & Related papers (2023-01-26T16:15:27Z) - Handling Distribution Shifts on Graphs: An Invariance Perspective [78.31180235269035]
We formulate the OOD problem on graphs and develop a new invariant learning approach, Explore-to-Extrapolate Risk Minimization (EERM)
EERM resorts to multiple context explorers that are adversarially trained to maximize the variance of risks from multiple virtual environments.
We prove the validity of our method by theoretically showing its guarantee of a valid OOD solution.
arXiv Detail & Related papers (2022-02-05T02:31:01Z) - BCDAG: An R package for Bayesian structure and Causal learning of
Gaussian DAGs [77.34726150561087]
We introduce the R package for causal discovery and causal effect estimation from observational data.
Our implementation scales efficiently with the number of observations and, whenever the DAGs are sufficiently sparse, the number of variables in the dataset.
We then illustrate the main functions and algorithms on both real and simulated datasets.
arXiv Detail & Related papers (2022-01-28T09:30:32Z) - Variational Causal Networks: Approximate Bayesian Inference over Causal
Structures [132.74509389517203]
We introduce a parametric variational family modelled by an autoregressive distribution over the space of discrete DAGs.
In experiments, we demonstrate that the proposed variational posterior is able to provide a good approximation of the true posterior.
arXiv Detail & Related papers (2021-06-14T17:52:49Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.