Related papers: Ordering for Non-Replacement SGD

Ordering for Non-Replacement SGD

URL: http://arxiv.org/abs/2306.15848v1
Date: Wed, 28 Jun 2023 00:46:58 GMT
Title: Ordering for Non-Replacement SGD
Authors: Yuetong Xu and Baharan Mirzasoleiman
Abstract summary: We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
Score: 7.11967773739707
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One approach for reducing run time and improving efficiency of machine learning is to reduce the convergence rate of the optimization algorithm used. Shuffling is an algorithm technique that is widely used in machine learning, but it only started to gain attention theoretically in recent years. With different convergence rates developed for random shuffling and incremental gradient descent, we seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. Based on existing bounds of the distance between the optimal and current iterate, we derive an upper bound that is dependent on the gradients at the beginning of the epoch. Through analysis of the bound, we are able to develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. We further test and verify our results through experiments on synthesis and real data sets. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks, which show promising results.

Related papers

Fast sparse optimization via adaptive shrinkage [0.6226609932118122]
We develop a proximal method, based on logarithmic regularization, which turns out to be an iterative shrinkage-thresholding algorithm. This adaptivity substantially enhances the trajectory of the algorithm, in a way that yields faster convergence. We validate its fast convergence via numerical experiments and we discuss the performance with respect to state-of-the-art algorithms.
arXiv Detail & Related papers (2025-01-21T15:58:21Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence. Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z)
An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization [14.336473214524663]
A key component in the algorithm is the randomness based on the value of the objective function. We prove the convergence of the algorithm with an algebra and tuning in the parameter space. We present several numerical examples to demonstrate the efficiency and robustness of the algorithm.
arXiv Detail & Related papers (2022-04-12T16:27:49Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods. Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
Learning to Optimize Non-Rigid Tracking [54.94145312763044]
We employ learnable optimizations to improve robustness and speed up solver convergence. First, we upgrade the tracking objective by integrating an alignment data term on deep features which are learned end-to-end through CNN. Second, we bridge the gap between the preconditioning technique and learning method by introducing a ConditionNet which is trained to generate a preconditioner.
arXiv Detail & Related papers (2020-03-27T04:40:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.