A Bregman Method for Structure Learning on Sparse Directed Acyclic
Graphs
- URL: http://arxiv.org/abs/2011.02764v1
- Date: Thu, 5 Nov 2020 11:37:44 GMT
- Title: A Bregman Method for Structure Learning on Sparse Directed Acyclic
Graphs
- Authors: Manon Romain and Alexandre d'Aspremont
- Abstract summary: We develop a Bregman proximal gradient method for structure learning.
We measure the impact of curvature against a highly nonlinear iteration.
We test our method on various synthetic and real sets.
- Score: 84.7328507118758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a Bregman proximal gradient method for structure learning on
linear structural causal models. While the problem is non-convex, has high
curvature and is in fact NP-hard, Bregman gradient methods allow us to
neutralize at least part of the impact of curvature by measuring smoothness
against a highly nonlinear kernel. This allows the method to make longer steps
and significantly improves convergence. Each iteration requires solving a
Bregman proximal step which is convex and efficiently solvable for our
particular choice of kernel. We test our method on various synthetic and real
data sets.
Related papers
- Gradient Methods with Online Scaling [19.218484733179356]
We introduce a framework to accelerate the convergence of gradient-based methods with online learning.
We show for the first time that the widely-used hypergradient descent improves on the convergence of gradient descent.
arXiv Detail & Related papers (2024-11-04T05:04:18Z) - Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods [0.3222802562733786]
We consider minimizing finite-sum expectation objective functions via Hessian-averaging based subsampled Newton methods.
These methods allow for inexactness and have fixed per-it Hessian approximation costs.
We present novel analysis techniques and propose challenges for their practical implementation.
arXiv Detail & Related papers (2024-08-14T03:27:48Z) - An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Integer Programming for Learning Directed Acyclic Graphs from Non-identifiable Gaussian Models [6.54203362045253]
We study the problem of learning directed acyclic graphs from continuous observational data.
We develop a mixed-integer programming framework for learning medium-sized problems.
Our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity.
arXiv Detail & Related papers (2024-04-19T02:42:13Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Cyclic Block Coordinate Descent With Variance Reduction for Composite
Nonconvex Optimization [26.218670461973705]
We propose methods for solving problems coordinate non-asymptotic gradient norm guarantees.
Our results demonstrate the efficacy of the proposed cyclic-reduced scheme in training deep neural nets.
arXiv Detail & Related papers (2022-12-09T19:17:39Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.