Adaptive Gradient Methods with Local Guarantees
- URL: http://arxiv.org/abs/2203.01400v1
- Date: Wed, 2 Mar 2022 20:45:14 GMT
- Title: Adaptive Gradient Methods with Local Guarantees
- Authors: Zhou Lu, Wenhan Xia, Sanjeev Arora, Elad Hazan
- Abstract summary: We propose an adaptive gradient method that has provable adaptive regret guarantees vs. the best local preconditioner.
We demonstrate the robustness of our method in automatically choosing the optimal learning rate schedule for popular benchmarking tasks in vision and language domains.
- Score: 48.980206926987606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adaptive gradient methods are the method of choice for optimization in
machine learning and used to train the largest deep models. In this paper we
study the problem of learning a local preconditioner, that can change as the
data is changing along the optimization trajectory. We propose an adaptive
gradient method that has provable adaptive regret guarantees vs. the best local
preconditioner. To derive this guarantee, we prove a new adaptive regret bound
in online learning that improves upon previous adaptive online learning
methods. We demonstrate the robustness of our method in automatically choosing
the optimal learning rate schedule for popular benchmarking tasks in vision and
language domains. Without the need to manually tune a learning rate schedule,
our method can, in a single run, achieve comparable and stable task accuracy as
a fine-tuned optimizer.
Related papers
- Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Interpreting Adaptive Gradient Methods by Parameter Scaling for
Learning-Rate-Free Optimization [14.009179786857802]
We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks.
While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent.
In this paper, we interpret adaptive gradient methods as steepest descent applied on parameter-scaled networks.
arXiv Detail & Related papers (2024-01-06T15:45:29Z) - A Nonstochastic Control Approach to Optimization [26.744354103012448]
We show how recent methods from control preconditions can overcome the challenge of convex nonity.
We can learn a method that attains a similar result in hindsight from a class of methods.
arXiv Detail & Related papers (2023-01-19T06:08:01Z) - Differentially Private Adaptive Optimization with Delayed
Preconditioners [44.190582378775694]
We explore techniques to estimate adapt geometry in training without auxiliary data.
Motivated by the observation that adaptive methods can tolerate stale preconditioners, we propose differentially adaptively private training.
Empirically, we explore DP2, demonstrating that it can improve convergence speed by as much as 4x relative to non-adaptive baselines.
arXiv Detail & Related papers (2022-12-01T06:59:30Z) - BFE and AdaBFE: A New Approach in Learning Rate Automation for
Stochastic Optimization [3.541406632811038]
gradient-based optimization approach by automatically adjusting the learning rate is proposed.
This approach could be an alternative method to optimize the learning rate based on the gradient descent (SGD) algorithm.
arXiv Detail & Related papers (2022-07-06T15:55:53Z) - Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive
Step Size [29.15132344744801]
We establish local convergence for gradient descent with adaptive step size for problems such as matrix inversion.
We show that these first order optimization methods can achieve sub-linear or linear convergence.
arXiv Detail & Related papers (2021-12-30T00:50:30Z) - SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems.
In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.