Related papers: Periodic Stochastic Gradient Descent with Momentum for Decentralized Training

Periodic Stochastic Gradient Descent with Momentum for Decentralized Training

URL: http://arxiv.org/abs/2008.10435v1
Date: Mon, 24 Aug 2020 13:38:22 GMT
Title: Periodic Stochastic Gradient Descent with Momentum for Decentralized Training
Authors: Hongchang Gao, Heng Huang
Abstract summary: We propose a novel periodic decentralized momentum SGD method, which employs the momentum schema and periodic communication for decentralized training. We conduct extensive experiments to verify the performance of our proposed two methods, and both of them have shown superior performance over existing methods.
Score: 114.36410688552579
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decentralized training has been actively studied in recent years. Although a wide variety of methods have been proposed, yet the decentralized momentum SGD method is still underexplored. In this paper, we propose a novel periodic decentralized momentum SGD method, which employs the momentum schema and periodic communication for decentralized training. With these two strategies, as well as the topology of the decentralized training system, the theoretical convergence analysis of our proposed method is difficult. We address this challenging problem and provide the condition under which our proposed method can achieve the linear speedup regarding the number of workers. Furthermore, we also introduce a communication-efficient variant to reduce the communication cost in each communication round. The condition for achieving the linear speedup is also provided for this variant. To the best of our knowledge, these two methods are all the first ones achieving these theoretical results in their corresponding domain. We conduct extensive experiments to verify the performance of our proposed two methods, and both of them have shown superior performance over existing methods.

Related papers

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization [20.588374906635256]
We design two compressed decentralized algorithms for solving non-compression heterogeneity optimization under two different scenarios.<n>Both algorithms adopt a momentum technique to achieve fast convergence and a message-speed technique to save communication costs.<n> Superior empirical performance is observed over state-of-the-art methods on deep neural networks (DNNs) and Transformers.
arXiv Detail & Related papers (2025-08-07T00:33:00Z)
Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning [11.152220052762209]
Actor-critic methods for decentralized multi-agent reinforcement learning (MARL) facilitate collaborative optimal decision making without centralized coordination.<n>We make the first attempt to develop a deep neural actor-critic method for decentralized MARL, where both the actor and critic components are inherently non-linear.<n>This marks the first global convergence result for deep neural actor-critic methods in the MARL literature.
arXiv Detail & Related papers (2025-05-24T00:00:43Z)
A Neural Network Training Method Based on Distributed PID Control [0.0]
In the previous article, we introduced a neural network framework based on symmetric differential equations. This study proposes an alternative training approach that utilizes differential equation signal propagation instead of chain rule derivation.
arXiv Detail & Related papers (2024-11-18T19:25:26Z)
From promise to practice: realizing high-performance decentralized training [8.955918346078935]
Decentralized training of deep neural networks has attracted significant attention for its theoretically superior scalability over synchronous data-parallel methods like All-Reduce. This paper identifies three key factors that can lead to speedups over All-Reduce training and constructs a runtime model to determine when, how, and to what degree decentralization can yield shorter per-it runtimes.
arXiv Detail & Related papers (2024-10-15T19:04:56Z)
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games. Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z)
Local Methods with Adaptivity via Scaling [38.99428012275441]
This paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods. We consider the classical Local SGD method and enhance it with a scaling feature. In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.
arXiv Detail & Related papers (2024-06-02T19:50:05Z)
Scalable Optimal Margin Distribution Machine [50.281535710689795]
Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory. This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method.
arXiv Detail & Related papers (2023-05-08T16:34:04Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
Training Generative Adversarial Networks in One Stage [58.983325666852856]
We introduce a general training scheme that enables training GANs efficiently in only one stage. We show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation.
arXiv Detail & Related papers (2021-02-28T09:03:39Z)
A general framework for decentralized optimization with first-order methods [11.50057411285458]
Decentralized optimization to minimize a finite sum of functions over a network of nodes has been a significant focus in control and signal processing research. The emergence of sophisticated computing and large-scale data science needs have led to a resurgence of activity in this area. We discuss decentralized first-order gradient methods, which have found tremendous success in control, signal processing, and machine learning problems.
arXiv Detail & Related papers (2020-09-12T17:52:10Z)
Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically. Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers. To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z)
Step-Ahead Error Feedback for Distributed Training with Compressed Gradient [99.42912552638168]
We show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training. We propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis.
arXiv Detail & Related papers (2020-08-13T11:21:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.