A Bregman Method for Structure Learning on Sparse Directed Acyclic
Graphs
- URL: http://arxiv.org/abs/2011.02764v1
- Date: Thu, 5 Nov 2020 11:37:44 GMT
- Title: A Bregman Method for Structure Learning on Sparse Directed Acyclic
Graphs
- Authors: Manon Romain and Alexandre d'Aspremont
- Abstract summary: We develop a Bregman proximal gradient method for structure learning.
We measure the impact of curvature against a highly nonlinear iteration.
We test our method on various synthetic and real sets.
- Score: 84.7328507118758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a Bregman proximal gradient method for structure learning on
linear structural causal models. While the problem is non-convex, has high
curvature and is in fact NP-hard, Bregman gradient methods allow us to
neutralize at least part of the impact of curvature by measuring smoothness
against a highly nonlinear kernel. This allows the method to make longer steps
and significantly improves convergence. Each iteration requires solving a
Bregman proximal step which is convex and efficiently solvable for our
particular choice of kernel. We test our method on various synthetic and real
data sets.
Related papers
- An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Integer Programming for Learning Directed Acyclic Graphs from Non-identifiable Gaussian Models [6.54203362045253]
We study the problem of learning directed acyclic graphs from continuous observational data.
We develop a mixed-integer programming framework for learning medium-sized problems.
We show via numerical experiments that our method outperforms three state-of-the-art algorithms.
arXiv Detail & Related papers (2024-04-19T02:42:13Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Cyclic Block Coordinate Descent With Variance Reduction for Composite
Nonconvex Optimization [26.218670461973705]
We propose methods for solving problems coordinate non-asymptotic gradient norm guarantees.
Our results demonstrate the efficacy of the proposed cyclic-reduced scheme in training deep neural nets.
arXiv Detail & Related papers (2022-12-09T19:17:39Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.