DAGs with No Fears: A Closer Look at Continuous Optimization for
Learning Bayesian Networks
- URL: http://arxiv.org/abs/2010.09133v1
- Date: Sun, 18 Oct 2020 22:59:37 GMT
- Title: DAGs with No Fears: A Closer Look at Continuous Optimization for
Learning Bayesian Networks
- Authors: Dennis Wei, Tian Gao, Yue Yu
- Abstract summary: We re-examine a continuous optimization framework dubbed NOTEARS for learning Bayesian networks.
We show that the Karush-Kuhn-Tucker optimality conditions for the NOTEARS cannot be satisfied except in a trivial case.
Some combinations with local search are both more accurate and more efficient than the original NOTEARS.
- Score: 45.3591788771536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper re-examines a continuous optimization framework dubbed NOTEARS for
learning Bayesian networks. We first generalize existing algebraic
characterizations of acyclicity to a class of matrix polynomials. Next,
focusing on a one-parameter-per-edge setting, it is shown that the
Karush-Kuhn-Tucker (KKT) optimality conditions for the NOTEARS formulation
cannot be satisfied except in a trivial case, which explains a behavior of the
associated algorithm. We then derive the KKT conditions for an equivalent
reformulation, show that they are indeed necessary, and relate them to explicit
constraints that certain edges be absent from the graph. If the score function
is convex, these KKT conditions are also sufficient for local minimality
despite the non-convexity of the constraint. Informed by the KKT conditions, a
local search post-processing algorithm is proposed and shown to substantially
and universally improve the structural Hamming distance of all tested
algorithms, typically by a factor of 2 or more. Some combinations with local
search are both more accurate and more efficient than the original NOTEARS.
Related papers
- Karush-Kuhn-Tucker Condition-Trained Neural Networks (KKT Nets) [0.0]
We present a novel approach to solving convex optimization problems by leveraging the Karush-Kuhn-Tucker (KKT) conditions.
Similar to Theory-Trained Neural Networks (TTNNs), the parameters of the convex optimization problem are input to the neural network.
A choice for the loss function in this case is a loss, which we refer to as the KKT Loss, that measures how well the network's outputs satisfy the KKT conditions.
arXiv Detail & Related papers (2024-10-21T12:59:58Z) - ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm, dubbed ALEXR.
We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions.
We present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate algorithms for the considered class of cFCCO problems.
arXiv Detail & Related papers (2023-12-04T19:00:07Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Performance of $\ell_1$ Regularization for Sparse Convex Optimization [10.37112414795167]
We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features.
Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances.
We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.
arXiv Detail & Related papers (2023-07-14T15:31:45Z) - Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from
KKT Conditions for Margin Maximization [59.038366742773164]
Linears and leaky ReLU trained by gradient flow on logistic loss have an implicit bias towards satisfying the Karush-KuTucker (KKT) conditions.
In this work we establish a number of settings where the satisfaction of these conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks.
arXiv Detail & Related papers (2023-03-02T18:24:26Z) - Partial Optimality in Cubic Correlation Clustering [6.372499127940625]
Certifying optimality is NP-hard and practically hampered already by the complexity of the problem statement.
Here, we focus on establishing partial optimality conditions for the special case of complete graphs and cubic objective functions.
In addition, we define and implement algorithms for testing these conditions and examine their effect numerically, on two datasets.
arXiv Detail & Related papers (2023-02-09T15:25:52Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.