Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGs
- URL: http://arxiv.org/abs/2310.05655v2
- Date: Tue, 27 Aug 2024 09:24:41 GMT
- Title: Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGs
- Authors: Moritz Schauer, Marcel Wienöbst,
- Abstract summary: We devise a non-reversible continuous time Markov chain that targets a probability distribution over classes of DAGs.
We develop new algorithms for listing, counting, uniformly sampling, and applying possible moves of the GES operators.
- Score: 4.811183825795439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of inferring a Bayesian network structure (directed acyclic graph, DAG for short), we devise a non-reversible continuous time Markov chain, the ``Causal Zig-Zag sampler'', that targets a probability distribution over classes of observationally equivalent (Markov equivalent) DAGs. The classes are represented as completed partially directed acyclic graphs (CPDAGs). The non-reversible Markov chain relies on the operators used in Chickering's Greedy Equivalence Search (GES) and is endowed with a momentum variable, which improves mixing significantly as we show empirically. The possible target distributions include posterior distributions based on a prior over DAGs and a Markov equivalent likelihood. We offer an efficient implementation wherein we develop new algorithms for listing, counting, uniformly sampling, and applying possible moves of the GES operators, all of which significantly improve upon the state-of-the-art run-time.
Related papers
- Attention (as Discrete-Time Markov) Chains [70.46604474584181]
We introduce a new interpretation of the attention matrix as a discrete-time Markov chain.<n>Our main observation is that tokens corresponding to semantically similar regions form a set of metastable states.<n>Using these lightweight tools, we demonstrate state-of-the-art zero-shot segmentation.
arXiv Detail & Related papers (2025-07-23T16:20:47Z) - Scalable Variational Causal Discovery Unconstrained by Acyclicity [6.954510776782872]
We propose a scalable Bayesian approach to learn the posterior distribution over causal graphs given observational data.
We introduce a novel differentiable DAG sampling method that can generate a valid acyclic causal graph.
We are able to model the posterior distribution over causal graphs using a simple variational distribution over a continuous domain.
arXiv Detail & Related papers (2024-07-06T07:56:23Z) - ProDAG: Projected Variational Inference for Directed Acyclic Graphs [8.556906995059324]
Directed acyclic graph (DAG) learning is a central task in structure discovery and causal inference.<n>We develop a Bayesian variational inference framework based on novel, provably valid distributions that have support directly on the space of sparse DAGs.<n>We empirically demonstrate that our method, ProDAG, can outperform state-of-the-art alternatives in both accuracy and uncertainty quantification.
arXiv Detail & Related papers (2024-05-24T03:04:28Z) - Variational DAG Estimation via State Augmentation With Stochastic Permutations [16.57658783816741]
Estimating the structure of a Bayesian network from observational data is a statistically and computationally hard problem.
From a probabilistic inference perspective, the main challenges are (i) representing distributions over graphs that satisfy the DAG constraint and (ii) estimating a posterior over the underlying space.
We propose an approach that addresses these challenges by formulating a joint distribution on an augmented space of DAGs and permutations.
arXiv Detail & Related papers (2024-02-04T23:51:04Z) - A Probabilistic Semi-Supervised Approach with Triplet Markov Chains [1.000779758350696]
Triplet Markov chains are general generative models for sequential data.
We propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models.
arXiv Detail & Related papers (2023-09-07T13:34:20Z) - Covariate shift in nonparametric regression with Markovian design [0.0]
We show that convergence rates for a smoothness risk of a Nadaraya-Watson kernel estimator are determined by the similarity between the invariant distributions associated to source and target Markov chains.
We extend the notion of a distribution exponent from Kpotufe and Martinet to kernel transfer exponents of uniformly ergodic Markov chains.
arXiv Detail & Related papers (2023-07-17T14:24:27Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Score-Based Diffusion meets Annealed Importance Sampling [89.92133671626327]
Annealed Importance Sampling remains one of the most effective methods for marginal likelihood estimation.
We leverage recent progress in score-based generative modeling to approximate the optimal extended target distribution for AIS proposals.
arXiv Detail & Related papers (2022-08-16T12:13:29Z) - Bayesian Structure Learning with Generative Flow Networks [85.84396514570373]
In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) from data.
Recently, a class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling.
We show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs.
arXiv Detail & Related papers (2022-02-28T15:53:10Z) - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z) - Convergence of Recursive Stochastic Algorithms using Wasserstein
Divergence [4.688616907736838]
We show that convergence of a large family of constant stepsize RSAs can be understood using this framework.
We show that convergence of a large family of constant stepsize RSAs can be understood using this framework.
arXiv Detail & Related papers (2020-03-25T13:45:16Z) - Embedding Graph Auto-Encoder for Graph Clustering [90.8576971748142]
Graph auto-encoder (GAE) models are based on semi-supervised graph convolution networks (GCN)
We design a specific GAE-based model for graph clustering to be consistent with the theory, namely Embedding Graph Auto-Encoder (EGAE)
EGAE consists of one encoder and dual decoders.
arXiv Detail & Related papers (2020-02-20T09:53:28Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.