Related papers: Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic

Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic

URL: http://arxiv.org/abs/1910.08597v5
Date: Sat, 17 Feb 2024 00:18:21 GMT
Title: Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
Authors: Matteo Sordello, Niccol\`o Dalmasso, Hangfeng He and Weijie Su
Abstract summary: SplitSGD is a new dynamic learning schedule for optimization. The method decreases the learning rate for better adaptation to the local geometry of the objective function. It essentially does not incur additional computational cost than standard SGD.
Score: 5.395127324484869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is detected, that is, the iterates are likely to bounce at around a vicinity of a local minimum. The detection is performed by splitting the single thread into two and using the inner product of the gradients from the two threads as a measure of stationarity. Owing to this simple yet provably valid stationarity detection, SplitSGD is easy-to-implement and essentially does not incur additional computational cost than standard SGD. Through a series of extensive experiments, we show that this method is appropriate for both convex problems and training (non-convex) neural networks, with performance compared favorably to other stochastic optimization methods. Importantly, this method is observed to be very robust with a set of default parameters for a wide range of problems and, moreover, can yield better generalization performance than other adaptive gradient methods such as Adam.

Related papers

SVD-Preconditioned Gradient Descent Method for Solving Nonlinear Least Squares Problems [27.21342746802453]
This paper introduces a novel optimization algorithm designed for nonlinear least-squares problems.<n>The method is derived by preconditioning the gradient descent direction using the Singular Value Decomposition (SVD) of the Jacobian.<n>We establish the local linear convergence of the proposed method under standard regularity assumptions and prove global convergence for a modified version of the algorithm under suitable conditions.
arXiv Detail & Related papers (2026-02-07T18:53:00Z)
TRSVR: An Adaptive Stochastic Trust-Region Method with Variance Reduction [17.083793956698994]
We propose a trust method for unconstrained non-reduced optimization that incorporates variance-region (SVRG) to accelerate convergence.<n>The proposed algorithm relies solely on gradient information and does not require function value evaluations.
arXiv Detail & Related papers (2026-01-21T04:41:57Z)
Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW [2.028622227373579]
gradient-based descent (SGD) have long been central to training large language models (LLMs)<n>This paper proposes a conjugate subgradient method together with adaptive sampling specifically for training LLMs.
arXiv Detail & Related papers (2025-07-01T23:30:15Z)
The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines [1.738375118265695]
This paper proposes an innovative method specifically designed for kernel support vector machines (SVMs) It not only achieves faster iteration per iteration but also exhibits enhanced convergence when compared to conventional SFO techniques. Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods.
arXiv Detail & Related papers (2024-07-30T17:03:19Z)
Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses [5.052293146674794]
It is known that the standard descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam fail to converge if the learning rates do not converge to zero. In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates.
arXiv Detail & Related papers (2024-06-20T14:07:39Z)
Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets [48.1015832267945]
This research presents a method to meet requirements through the minimization objective function of the RPM algorithm. A branch-and-bound (BnB) algorithm is devised, which solely branches over the parameters, thereby boosting convergence rate. Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, when compared with prevailing state-of-the-art transformations.
arXiv Detail & Related papers (2024-05-14T13:28:57Z)
Bolstering Stochastic Gradient Descent with Model Building [0.0]
gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates. We propose an alternative approach to line search by using a new algorithm based on forward step model building. We show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems.
arXiv Detail & Related papers (2021-11-13T06:54:36Z)
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z)
Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Minimax Problems [80.46370778277186]
Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks. We develop a communication-efficient distributed extragrad algorithm, LocalAdaSient, with an adaptive learning rate suitable for solving convex-concave minimax problem in the. Server model. We demonstrate its efficacy through several experiments in both the homogeneous and heterogeneous settings.
arXiv Detail & Related papers (2021-06-18T09:42:05Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity. We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits. This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences. We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline. Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.