Delay-adaptive step-sizes for asynchronous learning
- URL: http://arxiv.org/abs/2202.08550v1
- Date: Thu, 17 Feb 2022 09:51:22 GMT
- Title: Delay-adaptive step-sizes for asynchronous learning
- Authors: Xuyang Wu, Sindri Magnusson, Hamid Reza Feyzmahdavian and Mikael
Johansson
- Abstract summary: We show that it is possible to use learning rates that depend on the actual time-varying delays in the system.
For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.
- Score: 8.272788656521415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In scalable machine learning systems, model training is often parallelized
over multiple nodes that run without tight synchronization. Most analysis
results for the related asynchronous algorithms use an upper bound on the
information delays in the system to determine learning rates. Not only are such
bounds hard to obtain in advance, but they also result in unnecessarily slow
convergence. In this paper, we show that it is possible to use learning rates
that depend on the actual time-varying delays in the system. We develop general
convergence results for delay-adaptive asynchronous iterations and specialize
these to proximal incremental gradient descent and block-coordinate descent
algorithms. For each of these methods, we demonstrate how delays can be
measured on-line, present delay-adaptive step-size policies, and illustrate
their theoretical and practical advantages over the state-of-the-art.
Related papers
- Queuing dynamics of asynchronous Federated Learning [15.26212962081762]
We study asynchronous federated learning mechanisms with nodes having potentially different computational speeds.
We propose a non-uniform sampling scheme for the central server that allows for lower delays with better complexity.
Our experiments clearly show a significant improvement of our method over current state-of-the-art asynchronous algorithms on an image classification problem.
arXiv Detail & Related papers (2024-02-12T18:32:35Z) - Asynchronous Distributed Optimization with Delay-free Parameters [9.062164411594175]
This paper develops asynchronous versions of two distributed algorithms, Prox-DGD and DGD-ATC, for solving consensus optimization problems over undirected networks.
In contrast to alternatives, our algorithms can converge to the fixed point set of their synchronous counterparts using step-sizes that are independent of the delays.
arXiv Detail & Related papers (2023-12-11T16:33:38Z) - Towards Understanding the Generalizability of Delayed Stochastic
Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.
Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization.
Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays [0.0]
We consider convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory.
We propose a robust training method for the constrained setting and derive non convergence guarantees that do not depend on prior knowledge of update delays, objective smoothness, and variance.
arXiv Detail & Related papers (2021-06-23T09:36:36Z) - Decentralized Optimization with Heterogeneous Delays: a Continuous-Time
Approach [6.187780920448871]
We propose a novel continuous-time framework to analyze asynchronous algorithms.
We describe a fully asynchronous decentralized algorithm to minimize the sum of smooth and strongly convex functions.
arXiv Detail & Related papers (2021-06-07T13:09:25Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Critical Parameters for Scalable Distributed Learning with Large Batches
and Asynchronous Updates [67.19481956584465]
It has been experimentally observed that the efficiency of distributed training with saturation (SGD) depends decisively on the batch size and -- in implementations -- on the staleness.
We show that our results are tight and illustrate key findings in numerical experiments.
arXiv Detail & Related papers (2021-03-03T12:08:23Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.