BlueFog: Make Decentralized Algorithms Practical for Optimization and
Deep Learning
- URL: http://arxiv.org/abs/2111.04287v1
- Date: Mon, 8 Nov 2021 06:06:39 GMT
- Title: BlueFog: Make Decentralized Algorithms Practical for Optimization and
Deep Learning
- Authors: Bicheng Ying, Kun Yuan, Hanbin Hu, Yiming Chen, Wotao Yin
- Abstract summary: We introduce BlueFog, a python library for straightforward, high-performance implementations of decentralized algorithms.
Based on a unified abstraction of various communication operations, BlueFog offers intuitive interfaces to implement a spectrum of decentralized algorithms.
BlueFog reaches a much higher throughput and achieves an overall $1.2times sim 1.8times$ speedup over Horovod, a state-of-the-art distributed deep learning package.
- Score: 29.427785235669358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized algorithm is a form of computation that achieves a global goal
through local dynamics that relies on low-cost communication between
directly-connected agents. On large-scale optimization tasks involving
distributed datasets, decentralized algorithms have shown strong, sometimes
superior, performance over distributed algorithms with a central node.
Recently, developing decentralized algorithms for deep learning has attracted
great attention. They are considered as low-communication-overhead alternatives
to those using a parameter server or the Ring-Allreduce protocol. However, the
lack of an easy-to-use and efficient software package has kept most
decentralized algorithms merely on paper. To fill the gap, we introduce
BlueFog, a python library for straightforward, high-performance implementations
of diverse decentralized algorithms. Based on a unified abstraction of various
communication operations, BlueFog offers intuitive interfaces to implement a
spectrum of decentralized algorithms, from those using a static, undirected
graph for synchronous operations to those using dynamic and directed graphs for
asynchronous operations. BlueFog also adopts several system-level acceleration
techniques to further optimize the performance on the deep learning tasks. On
mainstream DNN training tasks, BlueFog reaches a much higher throughput and
achieves an overall $1.2\times \sim 1.8\times$ speedup over Horovod, a
state-of-the-art distributed deep learning package based on Ring-Allreduce.
BlueFog is open source at https://github.com/Bluefog-Lib/bluefog.
Related papers
- A Communication and Computation Efficient Fully First-order Method for Decentralized Bilevel Optimization [16.020878731214083]
This paper introduces a fully first-order decentralized method for decentralized Bilevel optimization, $textC2$DFB.
$textC2$DFB is both compute- and communicate-efficient.
arXiv Detail & Related papers (2024-10-18T02:00:45Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Exponential Graph is Provably Efficient for Decentralized Deep Training [30.817705471352614]
We study so-called exponential graphs where every node is connected to $O(log(n)$ neighbors and $n$ is the total number of nodes.
This work proves such graphs can lead to both fast communication and effective averaging simultaneously.
We also discover that a sequence of $log(n)$ one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging.
arXiv Detail & Related papers (2021-10-26T02:33:39Z) - DESTRESS: Computation-Optimal and Communication-Efficient Decentralized
Nonconvex Finite-Sum Optimization [43.31016937305845]
Internet-of-things, networked sensing, autonomous systems and federated learning call for decentralized algorithms for finite-sum optimizations.
We develop DEcentralized STochastic REcurSive methodDESTRESS for non finite-sum optimization.
Detailed theoretical and numerical comparisons show that DESTRESS improves upon prior decentralized algorithms.
arXiv Detail & Related papers (2021-10-04T03:17:41Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Decentralized Deep Learning using Momentum-Accelerated Consensus [15.333413663982874]
We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset.
We propose and analyze a novel decentralized deep learning algorithm where the agents interact over a fixed communication topology.
Our algorithm is based on the heavy-ball acceleration method used in gradient-based protocol.
arXiv Detail & Related papers (2020-10-21T17:39:52Z) - Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models.
In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers.
We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.