Related papers: FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers

FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers

URL: http://arxiv.org/abs/2211.08025v1
Date: Tue, 15 Nov 2022 10:16:13 GMT
Title: FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers
Authors: Jinyu Chen, Wenchao Xu, Song Guo, Junxiao Wang, Jie Zhang, Haozhao Wang
Abstract summary: Federated Learning (FL) is an emerging paradigm that enables distributed users to collaboratively and iteratively train machine learning models without sharing their private data. Researchers are turning to using pre-trained Transformers instead of traditional convolutional neural networks in FL to leverage their excellent transfer learning capabilities. We show that fine-tuned Transformers achieve extraordinary performance on FL, and that the lightweight fine-tuning method facilitates a fast convergence rate and low communication costs.
Score: 16.465900409973656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated Learning (FL) is an emerging paradigm that enables distributed users to collaboratively and iteratively train machine learning models without sharing their private data. Motivated by the effectiveness and robustness of self-attention-based architectures, researchers are turning to using pre-trained Transformers (i.e., foundation models) instead of traditional convolutional neural networks in FL to leverage their excellent transfer learning capabilities. Despite recent progress, how pre-trained Transformer models play a role in FL remains obscure, that is, how to efficiently fine-tune these pre-trained models in FL and how FL users could benefit from this new paradigm. In this paper, we explore this issue and demonstrate that the fine-tuned Transformers achieve extraordinary performance on FL, and that the lightweight fine-tuning method facilitates a fast convergence rate and low communication costs. Concretely, we conduct a rigorous empirical study of three tuning methods (i.e., modifying the input, adding extra modules, and adjusting the backbone) using two types of pre-trained models (i.e., vision-language models and vision models) for FL. Our experiments show that 1) Fine-tuning the bias term of the backbone performs best when relying on a strong pre-trained model; 2) The vision-language model (e.g., CLIP) outperforms the pure vision model (e.g., ViT) and is more robust to the few-shot settings; 3) Compared to pure local training, FL with pre-trained models has a higher accuracy because it alleviates the problem of over-fitting. We will release our code and encourage further exploration of pre-trained Transformers and FL.

Related papers

Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning [33.88701368538447]
We propose an innovative model-based local training technique called Local Superior Soups'' Our method enhances local training across different clients, encouraging the exploration of a connected low-loss basin. We demonstrated its effectiveness and efficiency across diverse widely-used FL datasets.
arXiv Detail & Related papers (2024-10-31T06:20:17Z)
Heterogeneous Federated Learning with Splited Language Model [22.65325348176366]
Federated Split Learning (FSL) is a promising distributed learning paradigm in practice. In this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness. We are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits.
arXiv Detail & Related papers (2024-03-24T07:33:08Z)
A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications. We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z)
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis [94.10861578387443]
We explore the inference process of two mainstream T2V models using transformers and diffusion models. We propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights. Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning.
arXiv Detail & Related papers (2023-12-06T12:34:47Z)
An Emulator for Fine-Tuning Large Language Models using Small Language Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales. We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z)
Guiding The Last Layer in Federated Learning with Pre-Trained Models [18.382057374270143]
Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data. We show that fitting a classification head using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals.
arXiv Detail & Related papers (2023-06-06T18:02:02Z)
Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning [12.839398408791778]
Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices. Recent state-of-the-art pre-trained models are getting more capable but also have more parameters, known as the "Foundation Models" Can we find a solution to enable those strong and readily available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden? Specifically, we systemically evaluate the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings.
arXiv Detail & Related papers (2022-10-04T16:08:54Z)
FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning [18.357577491590686]
We propose the Federated Opportunistic Block Dropout (FedOBD) approach to train large-scale neural networks. FedOBD decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks. Experiments show that FedOBD reduces the overall communication overhead by more than 88% compared to the best performing baseline approach.
arXiv Detail & Related papers (2022-08-10T06:36:49Z)
On the Importance and Applicability of Pre-Training for Federated Learning [28.238484580662785]
We conduct a systematic study to explore pre-training for federated learning. We find that pre-training can improve FL, but also close its accuracy gap to the counterpart centralized learning. We conclude our paper with an attempt to understand the effect of pre-training on FL.
arXiv Detail & Related papers (2022-06-23T06:02:33Z)
Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer' With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
Over-the-Air Federated Learning from Heterogeneous Data [107.05618009955094]
Federated learning (FL) is a framework for distributed learning of centralized models. We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local gradient descent (SGD) FL algorithm. We numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.
arXiv Detail & Related papers (2020-09-27T08:28:25Z)
UVeQFed: Universal Vector Quantization for Federated Learning [179.06583469293386]
Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data. In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model. We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion.
arXiv Detail & Related papers (2020-06-05T07:10:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.