Related papers: Throughput-Optimal Topology Design for Cross-Silo Federated Learning

Throughput-Optimal Topology Design for Cross-Silo Federated Learning

URL: http://arxiv.org/abs/2010.12229v2
Date: Tue, 17 Nov 2020 19:04:14 GMT
Title: Throughput-Optimal Topology Design for Cross-Silo Federated Learning
Authors: Othmane Marfoq, Chuan Xu, Giovanni Neglia, Richard Vidal
Abstract summary: Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator. We propose practical algorithms that find a topology with the largest throughput or with provable throughput guarantees.
Score: 13.922754427601493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In this paper we define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput---number of communication rounds per time unit. We also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees. In realistic Internet networks with 10 Gbps access links for silos, our algorithms speed up training by a factor 9 and 1.5 in comparison to the master-slave architecture and to state-of-the-art MATCHA, respectively. Speedups are even larger with slower access links.

Related papers

DeMo: Decoupled Momentum Optimization [6.169574689318864]
Training large neural networks typically requires sharing between accelerators through specialized high-speed interconnects. We introduce bfDecoupled textbfMomentum (DeMo), a fused magnitude and data parallel algorithm that reduces inter-accelerator communication requirements. Empirical results show that models trained with DeMo match or exceed the performance of equivalent models trained with AdamW.
arXiv Detail & Related papers (2024-11-29T17:31:47Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture. We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z)
Time Minimization in Hierarchical Federated Learning [11.678121177730718]
Federated learning is a modern decentralized machine learning technique where user equipments perform machine learning tasks locally and then upload the model parameters to a central server. In this paper, we consider a 3-layer hierarchical federated learning system which involves model parameter exchanges between the cloud and edge servers.
arXiv Detail & Related papers (2022-10-07T13:53:20Z)
Learning Connectivity-Maximizing Network Configurations [123.01665966032014]
We propose a supervised learning approach with a convolutional neural network (CNN) that learns to place communication agents from an expert. We demonstrate the performance of our CNN on canonical line and ring topologies, 105k randomly generated test cases, and larger teams not seen during training. After training, our system produces connected configurations 2 orders of magnitude faster than the optimization-based scheme for teams of 10-20 agents.
arXiv Detail & Related papers (2021-12-14T18:59:01Z)
Predicting Bandwidth Utilization on Network Links Using Machine Learning [0.966840768820136]
We present a solution to predict the bandwidth utilization between different network links with a very high accuracy. A simulated network is created to collect data related to the performance of the network links on every interface. We show that the proposed solution can be used in real time with a reaction managed by a Software-Defined Networking (SDN) platform.
arXiv Detail & Related papers (2021-12-04T19:47:41Z)
Comfetch: Federated Learning of Large Networks on Constrained Clients via Sketching [28.990067638230254]
Federated learning (FL) is a popular paradigm for private and collaborative model training on the edge. We propose a novel algorithm, Comdirectional, which allows clients to train large networks using representations of the global neural network.
arXiv Detail & Related papers (2021-09-17T04:48:42Z)
Scheduling Policy and Power Allocation for Federated Learning in NOMA Based MEC [21.267954799102874]
Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while keeping data distributed. We propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate. Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks.
arXiv Detail & Related papers (2020-06-21T23:07:41Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.