Related papers: Omega: Optimistic EMA Gradients

Omega: Optimistic EMA Gradients

URL: http://arxiv.org/abs/2306.07905v2
Date: Tue, 26 Mar 2024 00:21:41 GMT
Title: Omega: Optimistic EMA Gradients
Authors: Juan Ramirez, Rohan Sukumaran, Quentin Bertrand, Gauthier Gidel,
Abstract summary: We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating historic gradients in its update rule. Although we do not provide convergence guarantees, our experiments on games show that Omega outperforms the optimistic gradient method when applied to linear players.
Score: 17.11877451333674
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training. Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.

Related papers

A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption. The main idea of the method is to adapt the $alpha by situational awareness. It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z)
Formal guarantees for heuristic optimization algorithms used in machine learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems. We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing. textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing. textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z)
A Comparison of Various Classical Optimizers for a Variational Quantum Linear Solver [0.0]
Variational Hybrid Quantum Classical Algorithms (VHQCAs) are a class of quantum algorithms intended to run on noisy quantum devices. These algorithms employ a parameterized quantum circuit (ansatz) and a quantum-classical feedback loop. A classical device is used to optimize the parameters in order to minimize a cost function that can be computed far more efficiently on a quantum device.
arXiv Detail & Related papers (2021-06-16T10:40:00Z)
Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process. We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z)
SMG: A Shuffling Gradient-Based Method with Momentum [25.389545522794172]
We combine two advanced ideas widely used in optimization for machine learning. We develop a novel shuffling-based momentum technique. Our tests have shown encouraging performance of the new algorithms.
arXiv Detail & Related papers (2020-11-24T04:12:35Z)
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics. We prove that our model-based procedure converges in noisy gradient setting. This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z)
Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties [18.973116252065278]
We propose a novel method called Expectigrad, which adjusts according to a per-component unweighted mean of all historical momentum term jointly between the numerator and denominator. We prove that Expectigrad cannot diverge on every instance of gradient optimization problem known to cause Adam to diverge.
arXiv Detail & Related papers (2020-10-03T13:34:27Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods. Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.