Related papers: Preconditioning for Accelerated Gradient Descent Optimization and Regularization

Preconditioning for Accelerated Gradient Descent Optimization and Regularization

URL: http://arxiv.org/abs/2410.00232v1
Date: Mon, 30 Sep 2024 20:58:39 GMT
Title: Preconditioning for Accelerated Gradient Descent Optimization and Regularization
Authors: Qiang Ye,
Abstract summary: We explain how preconditioning with AdaGrad, RMSProp, and Adam accelerates training. We demonstrate how normalization methods accelerate training by improving Hessian conditioning.
Score: 2.306205094107543
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accelerated training algorithms, such as adaptive learning rates and various normalization methods, are widely used but not fully understood. When regularization is introduced, standard optimizers like adaptive learning rates may not perform effectively. This raises the need for alternative regularization approaches and the question of how to properly combine regularization with preconditioning. In this paper, we address these challenges using the theory of preconditioning as follows: (1) We explain how preconditioning with AdaGrad, RMSProp, and Adam accelerates training; (2) We explore the interaction between regularization and preconditioning, outlining different options for selecting the variables for regularization, and in particular we discuss how to implement that for the gradient regularization; and (3) We demonstrate how normalization methods accelerate training by improving Hessian conditioning, and discuss how this perspective can lead to new preconditioning training algorithms. Our findings offer a unified mathematical framework for understanding various acceleration techniques and deriving appropriate regularization schemes.

Related papers

Gradient Correction in Federated Learning with Adaptive Optimization [19.93709245766609]
We propose tt FAdamGC, the first algorithm to integrate client-drift compensation into adaptive optimization.<n>We show that tt FAdamGC consistently outperforms existing methods in total communication and cost across varying levels of data.
arXiv Detail & Related papers (2025-02-04T21:21:30Z)
Discounted Adaptive Online Learning: Towards Better Regularization [5.5899168074961265]
We study online learning in adversarial nonstationary environments. We propose an adaptive (i.e., instance optimal) algorithm that improves the widespread non-adaptive baseline. We also consider the (Gibbs and Candes, 2021)-style online conformal prediction problem.
arXiv Detail & Related papers (2024-02-05T04:29:39Z)
Conditional Deformable Image Registration with Spatially-Variant and Adaptive Regularization [2.3419031955865517]
We propose a learning-based registration approach based on a novel conditional spatially adaptive instance normalization (CSAIN) Experiments show that our proposed method outperforms the baseline approaches while achieving spatially-variant and adaptive regularization.
arXiv Detail & Related papers (2023-03-19T16:12:06Z)
Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients. Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z)
Toward Learning Robust and Invariant Representations with Alignment Regularization and Data Augmentation [76.85274970052762]
This paper is motivated by a proliferation of options of alignment regularizations. We evaluate the performances of several popular design choices along the dimensions of robustness and invariance. We also formally analyze the behavior of alignment regularization to complement our empirical study under assumptions we consider realistic.
arXiv Detail & Related papers (2022-06-04T04:29:19Z)
SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems. In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z)
Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods. Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z)
Normalization Techniques in Training DNNs: Methodology, Analysis and Application [111.82265258916397]
Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs) This paper reviews and comments on the past, present and future of normalization methods in the context of training.
arXiv Detail & Related papers (2020-09-27T13:06:52Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
Stochastic batch size for adaptive regularization in deep network optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.