Compressed-VFL: Communication-Efficient Learning with Vertically
Partitioned Data
- URL: http://arxiv.org/abs/2206.08330v2
- Date: Tue, 28 Mar 2023 22:03:09 GMT
- Title: Compressed-VFL: Communication-Efficient Learning with Vertically
Partitioned Data
- Authors: Timothy Castiglia, Anirban Das, Shiqiang Wang, Stacy Patterson
- Abstract summary: We propose Compressed Vertical Learning (C-VFL) for communication training on vertically partitioned data.
We show experimentally that VFL can reduce communication by over $90%$ without a significant decrease of compression accuracy.
- Score: 15.85259386116784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Compressed Vertical Federated Learning (C-VFL) for
communication-efficient training on vertically partitioned data. In C-VFL, a
server and multiple parties collaboratively train a model on their respective
features utilizing several local iterations and sharing compressed intermediate
results periodically. Our work provides the first theoretical analysis of the
effect message compression has on distributed training over vertically
partitioned data. We prove convergence of non-convex objectives at a rate of
$O(\frac{1}{\sqrt{T}})$ when the compression error is bounded over the course
of training. We provide specific requirements for convergence with common
compression techniques, such as quantization and top-$k$ sparsification.
Finally, we experimentally show compression can reduce communication by over
$90\%$ without a significant decrease in accuracy over VFL without compression.
Related papers
- Communication-efficient Vertical Federated Learning via Compressed Error Feedback [24.32409923443071]
Communication overhead is a known bottleneck in learning (FL)
We propose error feedback over federated networks to train networks.
EFVFL does not require a vanishing compression error for smooth non-significant problems.
arXiv Detail & Related papers (2024-06-20T15:40:38Z) - Fed-CVLC: Compressing Federated Learning Communications with
Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds.
We show strong evidences that variable-length is beneficial for compression in FL.
We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - GraVAC: Adaptive Compression for Communication-Efficient Distributed DL
Training [0.0]
Distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model.
GraVAC is a framework to dynamically adjust compression factor throughout training by evaluating model progress and assessing information loss associated with compression.
As opposed to using a static compression factor, GraVAC reduces end-to-end training time for ResNet101, VGG16 and LSTM by 4.32x, 1.95x and 6.67x respectively.
arXiv Detail & Related papers (2023-05-20T14:25:17Z) - DoCoFL: Downlink Compression for Cross-Device Federated Learning [12.363097878376644]
$textsfDoCoFL$ is a new framework for downlink compression in the cross-device setting.
It offers significant bi-directional bandwidth reduction while achieving competitive accuracy to that of a baseline without any compression.
arXiv Detail & Related papers (2023-02-01T16:08:54Z) - Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
Communication Compression [8.591088380355252]
We present Optimus-CC, a fast and scalable distributed training framework for large NLP models with aggressive communication compression.
We propose techniques to avoid the model quality drop that comes from the compression.
We demonstrate our solution on a GPU cluster, and achieve superior speedup from the baseline state-of-the-art solutions for distributed training without sacrificing the model quality.
arXiv Detail & Related papers (2023-01-24T06:07:55Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Compressed Communication for Distributed Training: Adaptive Methods and
System [13.244482588437972]
Communication overhead severely hinders the scalability of distributed machine learning systems.
Recently, there has been a growing interest in using gradient compression to reduce the communication overhead.
In this paper, we first introduce a novel adaptive gradient method with gradient compression.
arXiv Detail & Related papers (2021-05-17T13:41:47Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings.
We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.