DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent
Method
- URL: http://arxiv.org/abs/2305.16284v4
- Date: Tue, 30 Jan 2024 02:02:06 GMT
- Title: DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent
Method
- Authors: Ahmed Khaled and Konstantin Mishchenko and Chi Jin
- Abstract summary: We prove that DoWG is optimally efficient -- matching the convergence rate of tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal.
DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties.
To complement our theory, we show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks.
- Score: 31.891933360081342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a new easy-to-implement parameter-free gradient-based
optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is
efficient -- matching the convergence rate of optimally tuned gradient descent
in convex optimization up to a logarithmic factor without tuning any
parameters, and universal -- automatically adapting to both smooth and
nonsmooth problems. While popular algorithms following the AdaGrad framework
compute a running average of the squared gradients to use for normalization,
DoWG maintains a new distance-based weighted version of the running average,
which is crucial to achieve the desired properties. To complement our theory,
we also show empirically that DoWG trains at the edge of stability, and
validate its effectiveness on practical machine learning tasks.
Related papers
- Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.
Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.
We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - An Adaptive Gradient Method with Energy and Momentum [0.0]
We introduce a novel algorithm for gradient-based optimization of objective functions.
The method is simple to implement, computationally efficient, and well suited for large-scale machine learning problems.
arXiv Detail & Related papers (2022-03-23T04:48:38Z) - Exploiting Adam-like Optimization Algorithms to Improve the Performance
of Convolutional Neural Networks [82.61182037130405]
gradient descent (SGD) is the main approach for training deep networks.
In this work, we compare Adam based variants based on the difference between the present and the past gradients.
We have tested ensemble of networks and the fusion with ResNet50 trained with gradient descent.
arXiv Detail & Related papers (2021-03-26T18:55:08Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Obtaining Adjustable Regularization for Free via Iterate Averaging [43.75491612671571]
Regularization for optimization is a crucial technique to avoid overfitting in machine learning.
We establish an averaging scheme that converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart.
Our approaches can be used for accelerated and preconditioned optimization methods as well.
arXiv Detail & Related papers (2020-08-15T15:28:05Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - An adaptive stochastic gradient-free approach for high-dimensional
blackbox optimization [0.0]
We propose an adaptive gradient-free (ASGF) approach for high-dimensional non-smoothing problems.
We illustrate the performance of this method on benchmark global problems and learning tasks.
arXiv Detail & Related papers (2020-06-18T22:47:58Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.