Structured second-order methods via natural gradient descent
- URL: http://arxiv.org/abs/2107.10884v1
- Date: Thu, 22 Jul 2021 19:03:53 GMT
- Title: Structured second-order methods via natural gradient descent
- Authors: Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
- Abstract summary: We propose new structured second-order methods and structured adaptive-gradient methods.
Natural-gradient descent is an attractive approach to design algorithms in many settings.
- Score: 43.51581051770027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose new structured second-order methods and structured
adaptive-gradient methods obtained by performing natural-gradient descent on
structured parameter spaces. Natural-gradient descent is an attractive approach
to design new algorithms in many settings such as gradient-free,
adaptive-gradient, and second-order methods. Our structured methods not only
enjoy a structural invariance but also admit a simple expression. Finally, we
test the efficiency of our proposed methods on both deterministic non-convex
problems and deep learning problems.
Related papers
- Towards Differentiable Multilevel Optimization: A Gradient-Based Approach [1.6114012813668932]
This paper introduces a novel gradient-based approach for multilevel optimization.
Our method significantly reduces computational complexity while improving both solution accuracy and convergence speed.
To the best of our knowledge, this is one of the first algorithms to provide a general version of implicit differentiation.
arXiv Detail & Related papers (2024-10-15T06:17:59Z) - On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms [4.307128674848627]
AdaPG$q,r$ is a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds.
Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations.
arXiv Detail & Related papers (2023-11-30T10:29:43Z) - On Constraints in First-Order Optimization: A View from Non-Smooth
Dynamical Systems [99.59934203759754]
We introduce a class of first-order methods for smooth constrained optimization.
Two distinctive features of our approach are that projections or optimizations over the entire feasible set are avoided.
The resulting algorithmic procedure is simple to implement even when constraints are nonlinear.
arXiv Detail & Related papers (2021-07-17T11:45:13Z) - Tractable structured natural gradient descent using local
parameterizations [43.51581051770027]
Natural-gradient descent on structured parameter spaces is computationally challenging due to complicated inverse Fisher-matrix computations.
We address this issue by using emphlocal- parameter coordinates.
We show results on a range of applications on deep learning, variational inference, and evolution strategies.
arXiv Detail & Related papers (2021-02-15T09:09:20Z) - Structured Sparsity Inducing Adaptive Optimizers for Deep Learning [94.23102887731417]
In this paper, we derive the weighted proximal operator, which is a necessary component of proximal gradient methods.
We show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns.
arXiv Detail & Related papers (2021-02-07T18:06:23Z) - A Bregman Method for Structure Learning on Sparse Directed Acyclic
Graphs [84.7328507118758]
We develop a Bregman proximal gradient method for structure learning.
We measure the impact of curvature against a highly nonlinear iteration.
We test our method on various synthetic and real sets.
arXiv Detail & Related papers (2020-11-05T11:37:44Z) - Conditional Gradient Methods for Convex Optimization with General Affine
and Nonlinear Constraints [8.643249539674612]
This paper presents new conditional gradient methods for solving convex optimization problems with general affine and nonlinear constraints.
We first present a new constraint extrapolated condition gradient (CoexCG) method that can achieve an $cal O (1/epsilon2)$ iteration complexity for both smooth and structured nonsmooth function constrained convex optimization.
We further develop novel variants of CoexCG, namely constraint extrapolated and dual regularized conditional gradient (CoexDurCG) methods, that can achieve similar iteration complexity to CoexCG but allow adaptive selection for algorithmic parameters.
arXiv Detail & Related papers (2020-06-30T23:49:38Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Handling the Positive-Definite Constraint in the Bayesian Learning Rule [33.87717973872535]
The Bayesian learning rule is a natural-gradient variational inference method.
When variational parameters lie in an open constraint set, the rule may not satisfy the constraint and requires line-searches which could slow down the algorithm.
Our work makes it easier to apply the rule in the presence of positive-definite constraints in parameter spaces.
arXiv Detail & Related papers (2020-02-24T03:29:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.