Federated Fine-Tuning of Foundation Models via Probabilistic Masking
- URL: http://arxiv.org/abs/2311.17299v1
- Date: Wed, 29 Nov 2023 01:10:39 GMT
- Title: Federated Fine-Tuning of Foundation Models via Probabilistic Masking
- Authors: Vasileios Tsouvalas, Yuki Asano, Aaqib Saeed
- Abstract summary: Foundation Models (FMs) have revolutionized machine learning with their adaptability and high performance across tasks.
Their integration into Federated Learning (FL) is challenging due to substantial communication overhead from their extensive parameterization.
We present DeltaMask, a novel method that efficiently fine-tunes FMs in FL at an ultra-low, well below 1 bpp.
- Score: 11.192113661738764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation Models (FMs) have revolutionized machine learning with their
adaptability and high performance across tasks; yet, their integration into
Federated Learning (FL) is challenging due to substantial communication
overhead from their extensive parameterization. Current communication-efficient
FL strategies, such as gradient compression, reduce bitrates to around $1$
bit-per-parameter (bpp). However, these approaches fail to harness the
characteristics of FMs, with their large number of parameters still posing a
challenge to communication efficiency, even at these bitrate regimes. In this
work, we present DeltaMask, a novel method that efficiently fine-tunes FMs in
FL at an ultra-low bitrate, well below 1 bpp. DeltaMask employs stochastic
masking to detect highly effective subnetworks within FMs and leverage
stochasticity and sparsity in client masks to compress updates into a compact
grayscale image using probabilistic filters, deviating from traditional weight
training approaches. Our comprehensive evaluations across various datasets and
architectures demonstrate DeltaMask efficiently achieves bitrates as low as
0.09 bpp, enhancing communication efficiency while maintaining FMs performance,
as measured on 8 datasets and 5 pre-trained models of various network
architectures.
Related papers
- FedFT: Improving Communication Performance for Federated Learning with Frequency Space Transformation [0.361593752383807]
We introduce FedFT (federated frequency-space transformation), a simple yet effective methodology for communicating model parameters in a Federated Learning setting.
FedFT uses Discrete Cosine Transform (DCT) to represent model parameters in frequency space, enabling efficient compression and reducing communication overhead.
We demonstrate the generalisability of the FedFT methodology on four datasets using comparative studies with three state-of-the-art FL baselines.
arXiv Detail & Related papers (2024-09-08T23:05:35Z) - Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning [29.727339562140653]
Current data compression methods, such as sparsification in Federated Averaging (FedAvg), effectively enhance the communication efficiency of Federated Learning (FL)
These methods encounter challenges such as the straggler problem and diminished model performance due to heterogeneous bandwidth and non-IID data.
We introduce a bandwidth-aware compression framework for FL, aimed at improving communication efficiency while mitigating the problems associated with non-IID data.
arXiv Detail & Related papers (2024-08-27T02:28:27Z) - FedPFT: Federated Proxy Fine-Tuning of Foundation Models [55.58899993272904]
Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges as a promising strategy for protecting data privacy and valuable FMs.
Existing methods fine-tune FM by allocating sub-FM to clients in FL, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients.
We propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules.
arXiv Detail & Related papers (2024-04-17T16:30:06Z) - Parametric Feature Transfer: One-shot Federated Learning with Foundation
Models [14.97955440815159]
In one-shot federated learning, clients collaboratively train a global model in a single round of communication.
This paper introduces FedPFT, a methodology that harnesses the transferability of foundation models to enhance both accuracy and communication efficiency in one-shot FL.
arXiv Detail & Related papers (2024-02-02T19:34:46Z) - Bridging the Gap Between Foundation Models and Heterogeneous Federated
Learning [9.198799314774437]
Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data.
Foundation models (FMs) have gained traction in the artificial intelligence (AI) community due to their exceptional performance across various tasks.
We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges.
arXiv Detail & Related papers (2023-09-30T04:31:53Z) - Hierarchical Over-the-Air FedGradNorm [50.756991828015316]
Multi-task learning (MTL) is a learning paradigm to learn multiple related tasks simultaneously with a single shared network.
We propose hierarchical over-the-air (HOTA) PFL with a dynamic weighting strategy which we call HOTA-FedGradNorm.
arXiv Detail & Related papers (2022-12-14T18:54:46Z) - HFedMS: Heterogeneous Federated Learning with Memorable Data Semantics
in Industrial Metaverse [49.1501082763252]
This paper presents HFEDMS for incorporating practical FL into the emerging Industrial Metaverse.
It reduces data heterogeneity through dynamic grouping and training mode conversion.
Then, it compensates for the forgotten knowledge by fusing compressed historical data semantics.
Experiments have been conducted on the streamed non-i.i.d. FEMNIST dataset using 368 simulated devices.
arXiv Detail & Related papers (2022-11-07T04:33:24Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - Federated Dynamic Sparse Training: Computing Less, Communicating Less,
Yet Learning Better [88.28293442298015]
Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices.
We develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST)
FedDST is a dynamic process that extracts and trains sparse sub-networks from the target full network.
arXiv Detail & Related papers (2021-12-18T02:26:38Z) - Low-Latency Federated Learning over Wireless Channels with Differential
Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server.
In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.