FedTune: A Deep Dive into Efficient Federated Fine-Tuning with
Pre-trained Transformers
- URL: http://arxiv.org/abs/2211.08025v1
- Date: Tue, 15 Nov 2022 10:16:13 GMT
- Title: FedTune: A Deep Dive into Efficient Federated Fine-Tuning with
Pre-trained Transformers
- Authors: Jinyu Chen, Wenchao Xu, Song Guo, Junxiao Wang, Jie Zhang, Haozhao
Wang
- Abstract summary: Federated Learning (FL) is an emerging paradigm that enables distributed users to collaboratively and iteratively train machine learning models without sharing their private data.
Researchers are turning to using pre-trained Transformers instead of traditional convolutional neural networks in FL to leverage their excellent transfer learning capabilities.
We show that fine-tuned Transformers achieve extraordinary performance on FL, and that the lightweight fine-tuning method facilitates a fast convergence rate and low communication costs.
- Score: 16.465900409973656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning (FL) is an emerging paradigm that enables distributed
users to collaboratively and iteratively train machine learning models without
sharing their private data. Motivated by the effectiveness and robustness of
self-attention-based architectures, researchers are turning to using
pre-trained Transformers (i.e., foundation models) instead of traditional
convolutional neural networks in FL to leverage their excellent transfer
learning capabilities. Despite recent progress, how pre-trained Transformer
models play a role in FL remains obscure, that is, how to efficiently fine-tune
these pre-trained models in FL and how FL users could benefit from this new
paradigm. In this paper, we explore this issue and demonstrate that the
fine-tuned Transformers achieve extraordinary performance on FL, and that the
lightweight fine-tuning method facilitates a fast convergence rate and low
communication costs. Concretely, we conduct a rigorous empirical study of three
tuning methods (i.e., modifying the input, adding extra modules, and adjusting
the backbone) using two types of pre-trained models (i.e., vision-language
models and vision models) for FL. Our experiments show that 1) Fine-tuning the
bias term of the backbone performs best when relying on a strong pre-trained
model; 2) The vision-language model (e.g., CLIP) outperforms the pure vision
model (e.g., ViT) and is more robust to the few-shot settings; 3) Compared to
pure local training, FL with pre-trained models has a higher accuracy because
it alleviates the problem of over-fitting. We will release our code and
encourage further exploration of pre-trained Transformers and FL.
Related papers
- Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning [33.88701368538447]
We propose an innovative model-based local training technique called Local Superior Soups''
Our method enhances local training across different clients, encouraging the exploration of a connected low-loss basin.
We demonstrated its effectiveness and efficiency across diverse widely-used FL datasets.
arXiv Detail & Related papers (2024-10-31T06:20:17Z) - Heterogeneous Federated Learning with Splited Language Model [22.65325348176366]
Federated Split Learning (FSL) is a promising distributed learning paradigm in practice.
In this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness.
We are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits.
arXiv Detail & Related papers (2024-03-24T07:33:08Z) - A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients.
In the wake of Foundation Models (FM), the reality is different for many deep learning applications.
We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z) - F3-Pruning: A Training-Free and Generalized Pruning Strategy towards
Faster and Finer Text-to-Video Synthesis [94.10861578387443]
We explore the inference process of two mainstream T2V models using transformers and diffusion models.
We propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights.
Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning.
arXiv Detail & Related papers (2023-12-06T12:34:47Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Guiding The Last Layer in Federated Learning with Pre-Trained Models [18.382057374270143]
Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data.
We show that fitting a classification head using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals.
arXiv Detail & Related papers (2023-06-06T18:02:02Z) - FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale
Neural Networks through Federated Learning [18.357577491590686]
We propose the Federated Opportunistic Block Dropout (FedOBD) approach to train large-scale neural networks.
FedOBD decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks.
Experiments show that FedOBD reduces the overall communication overhead by more than 88% compared to the best performing baseline approach.
arXiv Detail & Related papers (2022-08-10T06:36:49Z) - On the Importance and Applicability of Pre-Training for Federated
Learning [28.238484580662785]
We conduct a systematic study to explore pre-training for federated learning.
We find that pre-training can improve FL, but also close its accuracy gap to the counterpart centralized learning.
We conclude our paper with an attempt to understand the effect of pre-training on FL.
arXiv Detail & Related papers (2022-06-23T06:02:33Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Over-the-Air Federated Learning from Heterogeneous Data [107.05618009955094]
Federated learning (FL) is a framework for distributed learning of centralized models.
We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local gradient descent (SGD) FL algorithm.
We numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.
arXiv Detail & Related papers (2020-09-27T08:28:25Z) - UVeQFed: Universal Vector Quantization for Federated Learning [179.06583469293386]
Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data.
In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model.
We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion.
arXiv Detail & Related papers (2020-06-05T07:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.