Related papers: Structured second-order methods via natural gradient descent

Structured second-order methods via natural gradient descent

URL: http://arxiv.org/abs/2107.10884v1
Date: Thu, 22 Jul 2021 19:03:53 GMT
Title: Structured second-order methods via natural gradient descent
Authors: Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
Abstract summary: We propose new structured second-order methods and structured adaptive-gradient methods. Natural-gradient descent is an attractive approach to design algorithms in many settings.
Score: 43.51581051770027
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces. Natural-gradient descent is an attractive approach to design new algorithms in many settings such as gradient-free, adaptive-gradient, and second-order methods. Our structured methods not only enjoy a structural invariance but also admit a simple expression. Finally, we test the efficiency of our proposed methods on both deterministic non-convex problems and deep learning problems.

Related papers

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach [1.6114012813668932]
This paper introduces a novel gradient-based approach for multilevel optimization. Our method significantly reduces computational complexity while improving both solution accuracy and convergence speed. To the best of our knowledge, this is one of the first algorithms to provide a general version of implicit differentiation.
arXiv Detail & Related papers (2024-10-15T06:17:59Z)
On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms [4.307128674848627]
AdaPG$q,r$ is a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations.
arXiv Detail & Related papers (2023-11-30T10:29:43Z)
On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems [99.59934203759754]
We introduce a class of first-order methods for smooth constrained optimization. Two distinctive features of our approach are that projections or optimizations over the entire feasible set are avoided. The resulting algorithmic procedure is simple to implement even when constraints are nonlinear.
arXiv Detail & Related papers (2021-07-17T11:45:13Z)
Tractable structured natural gradient descent using local parameterizations [43.51581051770027]
Natural-gradient descent on structured parameter spaces is computationally challenging due to complicated inverse Fisher-matrix computations. We address this issue by using emphlocal- parameter coordinates. We show results on a range of applications on deep learning, variational inference, and evolution strategies.
arXiv Detail & Related papers (2021-02-15T09:09:20Z)
Structured Sparsity Inducing Adaptive Optimizers for Deep Learning [94.23102887731417]
In this paper, we derive the weighted proximal operator, which is a necessary component of proximal gradient methods. We show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns.
arXiv Detail & Related papers (2021-02-07T18:06:23Z)
A Bregman Method for Structure Learning on Sparse Directed Acyclic Graphs [84.7328507118758]
We develop a Bregman proximal gradient method for structure learning. We measure the impact of curvature against a highly nonlinear iteration. We test our method on various synthetic and real sets.
arXiv Detail & Related papers (2020-11-05T11:37:44Z)
Conditional Gradient Methods for Convex Optimization with General Affine and Nonlinear Constraints [8.643249539674612]
This paper presents new conditional gradient methods for solving convex optimization problems with general affine and nonlinear constraints. We first present a new constraint extrapolated condition gradient (CoexCG) method that can achieve an $cal O (1/epsilon2)$ iteration complexity for both smooth and structured nonsmooth function constrained convex optimization. We further develop novel variants of CoexCG, namely constraint extrapolated and dual regularized conditional gradient (CoexDurCG) methods, that can achieve similar iteration complexity to CoexCG but allow adaptive selection for algorithmic parameters.
arXiv Detail & Related papers (2020-06-30T23:49:38Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Handling the Positive-Definite Constraint in the Bayesian Learning Rule [33.87717973872535]
The Bayesian learning rule is a natural-gradient variational inference method. When variational parameters lie in an open constraint set, the rule may not satisfy the constraint and requires line-searches which could slow down the algorithm. Our work makes it easier to apply the rule in the presence of positive-definite constraints in parameter spaces.
arXiv Detail & Related papers (2020-02-24T03:29:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.