Related papers: A flexible framework for communication-efficient machine learning: from HPC to IoT

A flexible framework for communication-efficient machine learning: from HPC to IoT

URL: http://arxiv.org/abs/2003.06377v2
Date: Wed, 17 Jun 2020 07:58:19 GMT
Title: A flexible framework for communication-efficient machine learning: from HPC to IoT
Authors: Sarit Khirirat, Sindri Magn\'usson, Arda Aytekin, Mikael Johansson
Abstract summary: Communication-efficiency is now needed in a variety of different system architectures. We propose a flexible framework which adapts the compression level to the true gradient at each iteration. Our framework is easy to adapt from one technology to the next by modeling how the communication cost depends on the compression level for the specific technology.
Score: 13.300503079779952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing scale of machine learning tasks, it has become essential to reduce the communication between computing nodes. Early work on gradient compression focused on the bottleneck between CPUs and GPUs, but communication-efficiency is now needed in a variety of different system architectures, from high-performance clusters to energy-constrained IoT devices. In the current practice, compression levels are typically chosen before training and settings that work well for one task may be vastly suboptimal for another dataset on another architecture. In this paper, we propose a flexible framework which adapts the compression level to the true gradient at each iteration, maximizing the improvement in the objective function that is achieved per communicated bit. Our framework is easy to adapt from one technology to the next by modeling how the communication cost depends on the compression level for the specific technology. Theoretical results and practical experiments indicate that the automatic tuning strategies significantly increase communication efficiency on several state-of-the-art compression schemes.

Related papers

AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks. AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation. We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z)
Computation and Communication Efficient Lightweighting Vertical Federated Learning for Smart Building IoT [6.690593323665202]
IoT devices are evolving beyond basic data collection and control to actively participate in deep learning tasks. We propose a Lightweight Vertical Federated Learning framework that jointly optimize computational and communication efficiency. Experimental results on an image classification task demonstrate that LVFL effectively mitigates resource demands while maintaining competitive learning performance.
arXiv Detail & Related papers (2024-03-30T20:19:28Z)
Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side Distillation [10.541541376305245]
Federated Learning (FL) is a promising technique for the collaborative training of deep neural networks across multiple devices. FL is hindered by excessive communication costs due to repeated server-client communication during training. We propose FedCompress, a novel approach that combines dynamic weight clustering and server-side knowledge distillation.
arXiv Detail & Related papers (2024-01-25T14:49:15Z)
Federated learning compression designed for lightweight communications [0.0]
Federated Learning (FL) is a promising distributed machine learning method for edge-level machine learning. In this paper, we investigate the impact of compression techniques on FL for a typical image classification task.
arXiv Detail & Related papers (2023-10-23T08:36:21Z)
A Machine Learning Framework for Distributed Functional Compression over Wireless Channels in IoT [13.385373310554327]
IoT devices generate enormous data and state-of-the-art machine learning techniques together will revolutionize cyber-physical systems. Traditional cloud-based methods that focus on transferring data to a central location either for training or inference place enormous strain on network resources. We develop, to the best of our knowledge, the first machine learning framework for distributed functional compression over both the Gaussian Multiple Access Channel (GMAC) and AWGN channels.
arXiv Detail & Related papers (2022-01-24T06:38:39Z)
Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers. Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z)
ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning. ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models. Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z)
CosSGD: Nonlinear Quantization for Communication-efficient Federated Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server. We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning. Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
Ternary Compression for Communication-Efficient Federated Learning [17.97683428517896]
Federated learning provides a potential solution to privacy-preserving and secure machine learning. We propose a ternary federated averaging protocol (T-FedAvg) to reduce the upstream and downstream communication of federated learning systems. Our results show that the proposed T-FedAvg is effective in reducing communication costs and can even achieve slightly better performance on non-IID data.
arXiv Detail & Related papers (2020-03-07T11:55:34Z)
Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression. The proposed method automatically induces structured sparsity on the convolutional weights. We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.