Related papers: SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms

SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms

URL: http://arxiv.org/abs/2312.17369v1
Date: Thu, 28 Dec 2023 21:28:08 GMT
Title: SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms
Authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Robert Gower, Martin Tak\'a\v{c}
Abstract summary: Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that the search impacts by incorporating the curvature of the objective function. This paper proposes SANIA to tackle these challenges.
Score: 1.21748738176366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that modifies the search direction by incorporating information about the curvature of the objective function. However, despite their adaptive characteristics, these methods still require manual fine-tuning of the step-size. This, in turn, impacts the time required to solve a particular problem. This paper presents an optimization framework named SANIA to tackle these challenges. Beyond eliminating the need for manual step-size hyperparameter settings, SANIA incorporates techniques to address poorly scaled or ill-conditioned problems. We also explore several preconditioning methods, including Hutchinson's method, which approximates the Hessian diagonal of the loss function. We conclude with an extensive empirical examination of the proposed techniques across classification tasks, covering both convex and non-convex contexts.

Related papers

Adaptive Coordinate-Wise Step Sizes for Quasi-Newton Methods: A Learning-to-Optimize Approach [9.82454981262489]
We introduce a novel Learning-to-(L2O) model within the Broyden-Fletcher-Goldfarb-Shanno framework, which leverages neural networks to predict optimal coordinate-wise step sizes. Our approach achieves substantial improvements over traditional backtracking line search and hypergradient descent-based methods.
arXiv Detail & Related papers (2024-11-25T07:13:59Z)
Learning Joint Models of Prediction and Optimization [56.04498536842065]
Predict-Then-Then framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by joint predictive models.
arXiv Detail & Related papers (2024-09-07T19:52:14Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Stochastic Gradient Descent with Preconditioned Polyak Step-size [1.3300175008796402]
Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an dataset. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's and Ada's, to improve its performance on badly scaled and/or ill-conditioned datasets.
arXiv Detail & Related papers (2023-10-03T14:36:05Z)
Self-adaptive algorithms for quasiconvex programming and applications to machine learning [0.0]
We provide a self-adaptive step-size strategy that does not include convex line-search techniques and a generic approach under mild assumptions. The proposed method is verified by preliminary results from some computational examples. To demonstrate the effectiveness of the proposed technique for large-scale problems, we apply it to some experiments on machine learning.
arXiv Detail & Related papers (2022-12-13T05:30:29Z)
An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms [19.41328109094503]
We propose and analyze several novel adaptive variants of the popular SAGA algorithm. We establish its convergence guarantees under general settings. We improve the analysis of SAGA to support non-Euclidean norms.
arXiv Detail & Related papers (2022-04-28T09:43:07Z)
Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information [37.70729542263343]
We present a novel adaptive optimization algorithm for large-scale machine learning problems. Our method dynamically adapts the direction and step-size. Our methodology does not require a tedious tuning rate tuning.
arXiv Detail & Related papers (2021-09-11T06:39:50Z)
Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling. We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z)
AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity. We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits. This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z)
High Dimensional Level Set Estimation with Bayesian Neural Network [58.684954492439424]
This paper proposes novel methods to solve the high dimensional Level Set Estimation problems using Bayesian Neural Networks. For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points. Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-17T23:21:53Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.