Personalized Collaborative Fine-Tuning for On-Device Large Language Models
- URL: http://arxiv.org/abs/2404.09753v1
- Date: Mon, 15 Apr 2024 12:54:31 GMT
- Title: Personalized Collaborative Fine-Tuning for On-Device Large Language Models
- Authors: Nicolas Wagner, Dongyang Fan, Martin Jaggi,
- Abstract summary: We explore on-device self-supervised collaborative fine-tuning of large language models with limited local data availability.
We introduce three distinct trust-weighted gradient aggregation schemes: weight similarity-based, prediction similarity-based and validation performance-based.
Our protocols, driven by prediction and performance metrics, surpass both FedAvg and local fine-tuning methods.
- Score: 33.68104398807581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore on-device self-supervised collaborative fine-tuning of large language models with limited local data availability. Taking inspiration from the collaborative learning community, we introduce three distinct trust-weighted gradient aggregation schemes: weight similarity-based, prediction similarity-based and validation performance-based. To minimize communication overhead, we integrate Low-Rank Adaptation (LoRA) and only exchange LoRA weight updates. Our protocols, driven by prediction and performance metrics, surpass both FedAvg and local fine-tuning methods, which is particularly evident in realistic scenarios with more diverse local data distributions. The results underscore the effectiveness of our approach in addressing heterogeneity and scarcity within local datasets.
Related papers
- Dual-Personalizing Adapter for Federated Foundation Models [35.863585349109385]
We propose a new setting, termed test-time personalization, which concentrates on the targeted local task and extends to other tasks that exhibit test-time distribution shifts.
Specifically, a dual-personalizing adapter architecture (FedDPA) is proposed, comprising a global adapter and a local adapter for addressing test-time distribution shifts and personalization.
The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks.
arXiv Detail & Related papers (2024-03-28T08:19:33Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation [7.052566906745796]
FedLPA is a layer-wise posterior aggregation method for federated learning.
We show that FedLPA significantly improves learning performance over state-of-the-art methods across several metrics.
arXiv Detail & Related papers (2023-09-30T10:51:27Z) - FedDisco: Federated Learning with Discrepancy-Aware Collaboration [41.828780724903744]
We propose a novel aggregation method, Federated Learning with Discrepancy-aware Collaboration (FedDisco)
Our FedDisco outperforms several state-of-the-art methods and can be easily incorporated with many existing methods to further enhance the performance.
arXiv Detail & Related papers (2023-05-30T17:20:51Z) - FedAgg: Adaptive Federated Learning with Aggregated Gradients [1.5653612447564105]
Federated Learning (FL) has emerged as a pivotal paradigm within distributed model training.
We propose an adaptive learning rate iterative algorithm that concerns the divergence between local and average parameters.
We provide a robust convergence guarantee for our proposed algorithm and ensure its wide applicability.
arXiv Detail & Related papers (2023-03-28T08:07:28Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Parallel Successive Learning for Dynamic Distributed Model Training over
Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices.
We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions.
Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Analysis and Optimal Edge Assignment For Hierarchical Federated Learning
on Non-IID Data [43.32085029569374]
Federated learning algorithms aim to leverage distributed and diverse data stored at users' devices to learn a global phenomena.
In the cases where the participants' data are strongly skewed (i.e., non-IID), the local models can overfit local data, leading to low performing global model.
We propose a hierarchical learning system that performs Federated Gradient Descent on the user-edge layer and Federated Averaging on the edge-cloud layer.
arXiv Detail & Related papers (2020-12-10T12:18:13Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.