Related papers: Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective

Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective

URL: http://arxiv.org/abs/2106.00092v1
Date: Mon, 31 May 2021 20:30:25 GMT
Title: Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Authors: Kushal Chakrabarti, Nikhil Chopra
Abstract summary: We propose a new fast, Generalized AdaGrad (G-AdaGrad) for solving non machine learning problems. Specifically, we adopt a state-space perspective for analyzing the convergence acceleration algorithms, namely G-AdaGrad and Adam.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accelerated gradient-based methods are being extensively used for solving non-convex machine learning problems, especially when the data points are abundant or the available data is distributed across several agents. Two of the prominent accelerated gradient algorithms are AdaGrad and Adam. AdaGrad is the simplest accelerated gradient method, which is particularly effective for sparse data. Adam has been shown to perform favorably in deep learning problems compared to other methods. In this paper, we propose a new fast optimizer, Generalized AdaGrad (G-AdaGrad), for accelerating the solution of potentially non-convex machine learning problems. Specifically, we adopt a state-space perspective for analyzing the convergence of gradient acceleration algorithms, namely G-AdaGrad and Adam, in machine learning. Our proposed state-space models are governed by ordinary differential equations. We present simple convergence proofs of these two algorithms in the deterministic settings with minimal assumptions. Our analysis also provides intuition behind improving upon AdaGrad's convergence rate. We provide empirical results on MNIST dataset to reinforce our claims on the convergence and performance of G-AdaGrad and Adam.

Related papers

Towards Simple and Provable Parameter-Free Adaptive Gradient Methods [56.060918447252625]
We present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees. We prove that AdaGrad++ achieves comparable convergence rates to AdaGrad in convex optimization without predefined learning rate assumptions. Similarly, Adam++ matches the convergence rate of Adam without relying on any conditions on the learning rates.
arXiv Detail & Related papers (2024-12-27T04:22:02Z)
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad [16.249992982986956]
This paper introduces a novel adaptive algorithm named KATE which consistently matches complex machine learning tasks. We compare KATE to other state-of-the-art adaptive algorithms Adam AdaGrad in numerical experiments with different problems.
arXiv Detail & Related papers (2024-03-05T04:35:59Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption. The main idea of the method is to adapt the $alpha by situational awareness. It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z)
A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning [0.6526824510982802]
Adaptive gradient methods have become popular in optimizing deep neural networks. Recent examples include AdaGrad and Adam. We develop a generic framework for adaptive gradient methods.
arXiv Detail & Related papers (2022-06-04T17:55:33Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization. We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z)
Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product. We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z)
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime. We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice. We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory [34.6432726391469]
We analyze a problem of Adam by analyzing its performance in simple non-vision machine learning tasks. We propose a novel adaptive gradient named AdaX to solve the problem. AdaX outperforms Adam in various computer natural language processing tasks.
arXiv Detail & Related papers (2020-04-21T03:46:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.