Comet: A Communication-efficient and Performant Approximation for Private Transformer Inference
- URL: http://arxiv.org/abs/2405.17485v2
- Date: Sat, 7 Sep 2024 13:07:44 GMT
- Title: Comet: A Communication-efficient and Performant Approximation for Private Transformer Inference
- Authors: Xiangrui Xu, Qiao Zhang, Rui Ning, Chunsheng Xin, Hongyi Wu,
- Abstract summary: We introduce a novel plug-in method Comet to reduce the communication cost without compromising the inference performance.
We evaluate our Comet on Bert and RoBERTa models with GLUE benchmark datasets, showing up to 3.9$times$ less communication and 3.5$times$ speedups.
- Score: 16.328220661765744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prevalent use of Transformer-like models, exemplified by ChatGPT in modern language processing applications, underscores the critical need for enabling private inference essential for many cloud-based services reliant on such models. However, current privacy-preserving frameworks impose significant communication burden, especially for non-linear computation in Transformer model. In this paper, we introduce a novel plug-in method Comet to effectively reduce the communication cost without compromising the inference performance. We second introduce an efficient approximation method to eliminate the heavy communication in finding good initial approximation. We evaluate our Comet on Bert and RoBERTa models with GLUE benchmark datasets, showing up to 3.9$\times$ less communication and 3.5$\times$ speedups while keep competitive model performance compared to the prior art.
Related papers
- Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models [54.02863371927658]
Large Language Models (LLMs) have become indispensable in numerous real-world applications.
Ferret is the first first-order method with shared randomness.
It achieves high computational efficiency, reduced communication overhead, and fast convergence.
arXiv Detail & Related papers (2024-09-10T07:28:13Z) - SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization [15.73877955614998]
This paper presents a novel communication algorithm -- DeComFL, which reduces the communication cost from $mathscrO(d)$ to $mathscrO(1)$ by transmitting only a constant number of scalar values between clients.
Empirical evaluations, encompassing both classic deep learning training and large language model fine-tuning, demonstrate significant reductions in communication overhead.
arXiv Detail & Related papers (2024-05-24T18:07:05Z) - Fed-CVLC: Compressing Federated Learning Communications with
Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds.
We show strong evidences that variable-length is beneficial for compression in FL.
We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z) - Communication-Efficient Federated Learning through Adaptive Weight
Clustering and Server-Side Distillation [10.541541376305245]
Federated Learning (FL) is a promising technique for the collaborative training of deep neural networks across multiple devices.
FL is hindered by excessive communication costs due to repeated server-client communication during training.
We propose FedCompress, a novel approach that combines dynamic weight clustering and server-side knowledge distillation.
arXiv Detail & Related papers (2024-01-25T14:49:15Z) - Communication Efficient Federated Learning for Multilingual Neural
Machine Translation with Adapter [21.512817959760007]
Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources.
This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training.
However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck.
We propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients.
arXiv Detail & Related papers (2023-05-21T12:48:38Z) - Fundamental Limits of Communication Efficiency for Model Aggregation in
Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
We study the limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective.
It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD.
arXiv Detail & Related papers (2022-06-28T13:10:40Z) - FedLite: A Scalable Approach for Federated Learning on
Resource-constrained Clients [41.623518032533035]
In split learning, only a small part of the model is stored and trained on clients while the remaining large part of the model only stays at the servers.
This paper addresses this issue by compressing the additional communication using a novel clustering scheme accompanied by a gradient correction method.
arXiv Detail & Related papers (2022-01-28T00:09:53Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - Training Recommender Systems at Scale: Communication-Efficient Model and
Data Parallelism [56.78673028601739]
We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training.
DCT reduces communication by at least $100times$ and $20times$ during DP and MP, respectively.
It improves end-to-end training time for a state-of-the-art industrial recommender model by 37%, without any loss in performance.
arXiv Detail & Related papers (2020-10-18T01:44:42Z) - Distilled One-Shot Federated Learning [13.294757670979031]
We propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance.
In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model.
With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg.
arXiv Detail & Related papers (2020-09-17T01:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.