Related papers: Efficient Natural Gradient Descent Methods for Large-Scale Optimization Problems

Efficient Natural Gradient Descent Methods for Large-Scale Optimization Problems

URL: http://arxiv.org/abs/2202.06236v1
Date: Sun, 13 Feb 2022 07:32:17 GMT
Title: Efficient Natural Gradient Descent Methods for Large-Scale Optimization Problems
Authors: Levon Nurbekyan, Wanzhou Lei and Yunan Yang
Abstract summary: We propose an efficient method for computing natural gradient descent directions with respect to a generic metric in the state space. Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem. We can reliably compute several natural gradient descents, including the Wasserstein natural gradient parameter, for a large-scale space.
Score: 1.2891210250935146
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose an efficient numerical method for computing natural gradient descent directions with respect to a generic metric in the state space. Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem. Hence, instead of calculating, storing, or inverting the information matrix directly, we apply efficient methods from numerical linear algebra to solve this least-squares problem. We treat both scenarios where the derivative of the state variable with respect to the parameter is either explicitly known or implicitly given through constraints. We apply the QR decomposition to solve the least-squares problem in the former case and utilize the adjoint-state method to compute the natural gradient descent direction in the latter case. As a result, we can reliably compute several natural gradient descents, including the Wasserstein natural gradient, for a large-scale parameter space with thousands of dimensions, which was believed to be out of reach. Finally, our numerical results shed light on the qualitative differences among the standard gradient descent method and various natural gradient descent methods based on different metric spaces in large-scale nonconvex optimization problems.

Related papers

Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z)
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root. We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z)
Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood. These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z)
Natural Gradient Methods: Perspectives, Efficient-Scalable Approximations, and Analysis [0.0]
Natural Gradient Descent is a second-degree optimization method motivated by the information geometry. It makes use of the Fisher Information Matrix instead of the Hessian which is typically used. Being a second-order method makes it infeasible to be used directly in problems with a huge number of parameters and data.
arXiv Detail & Related papers (2023-03-06T04:03:56Z)
Achieving High Accuracy with PINNs via Energy Natural Gradients [0.0]
We show that the update direction in function space resulting from the energy natural gradient corresponds to the Newton direction modulo an projection onto the model's tangent space. We demonstrate experimentally that energy natural gradient descent yields highly accurate solutions with errors several orders of magnitude smaller than what is obtained when training PINNs with standard gradient descent or Adam.
arXiv Detail & Related papers (2023-02-25T21:17:19Z)
MARS via LASSO [1.5199437137239338]
We propose and study a natural variant of the MARS method. Our method is based on at least squares estimation over a convex class of functions. Our estimator can be computed via finite-dimensional convex optimization.
arXiv Detail & Related papers (2021-11-23T07:30:33Z)
Conditional gradient methods for stochastically constrained convex minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems. The most important feature of our framework is that only a subset of the constraints is processed at each iteration. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.