Likelihood-based Differentiable Structure Learning
- URL: http://arxiv.org/abs/2410.06163v2
- Date: Wed, 16 Oct 2024 16:40:00 GMT
- Title: Likelihood-based Differentiable Structure Learning
- Authors: Chang Deng, Kevin Bello, Pradeep Ravikumar, Bryon Aragam,
- Abstract summary: Existing approaches to differentiable structure learning of directed acyclic graphs (DAGs) rely on strong identifiability assumptions.
We explain and remedy these issues by studying the behavior of differentiable acyclicity-constrained programs under general likelihoods.
- Score: 38.25218464260965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing approaches to differentiable structure learning of directed acyclic graphs (DAGs) rely on strong identifiability assumptions in order to guarantee that global minimizers of the acyclicity-constrained optimization problem identifies the true DAG. Moreover, it has been observed empirically that the optimizer may exploit undesirable artifacts in the loss function. We explain and remedy these issues by studying the behavior of differentiable acyclicity-constrained programs under general likelihoods with multiple global minimizers. By carefully regularizing the likelihood, it is possible to identify the sparsest model in the Markov equivalence class, even in the absence of an identifiable parametrization. We first study the Gaussian case in detail, showing how proper regularization of the likelihood defines a score that identifies the sparsest model. Assuming faithfulness, it also recovers the Markov equivalence class. These results are then generalized to general models and likelihoods, where the same claims hold. These theoretical results are validated empirically, showing how this can be done using standard gradient-based optimizers, thus paving the way for differentiable structure learning under general models and losses.
Related papers
- Generalized Criterion for Identifiability of Additive Noise Models Using Majorization [7.448620208767376]
We introduce a novel identifiability criterion for directed acyclic graph (DAG) models.
We demonstrate that this criterion extends and generalizes existing identifiability criteria.
We present a new algorithm for learning a topological ordering of variables.
arXiv Detail & Related papers (2024-04-08T02:18:57Z) - Leveraging Task Structures for Improved Identifiability in Neural Network Representations [31.863998589693065]
We extend the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks.
We show that linear identifiability is achievable in the general multi-task regression setting.
arXiv Detail & Related papers (2023-06-26T17:16:50Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Deviance Matrix Factorization [6.509665408765348]
We investigate a general matrix factorization for deviance-based data losses, extending the ubiquitous singular value decomposition beyond squared error loss.
Our method leverages classical statistical methodology from generalized linear models (GLMs) and provides an efficient algorithm that is flexible enough to allow for structural zeros via entry weights.
arXiv Detail & Related papers (2021-10-12T01:27:55Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - VAE Approximation Error: ELBO and Conditional Independence [78.72292013299868]
This paper analyzes VAE approximation errors caused by the combination of the ELBO objective with the choice of the encoder probability family.
We show that the ELBO subset can not be enlarged, and the respective error cannot be decreased, by only considering deeper encoder networks.
arXiv Detail & Related papers (2021-02-18T12:54:42Z) - Understanding Double Descent Requires a Fine-Grained Bias-Variance
Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels.
We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior.
We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.