Enhancing Data Quality in Federated Fine-Tuning of Foundation Models
- URL: http://arxiv.org/abs/2403.04529v1
- Date: Thu, 7 Mar 2024 14:28:04 GMT
- Title: Enhancing Data Quality in Federated Fine-Tuning of Foundation Models
- Authors: Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang
- Abstract summary: We propose a data quality control pipeline for federated fine-tuning of foundation models.
This pipeline computes scores reflecting the quality of training data and determines a global threshold for a unified standard.
Our experiments show that the proposed quality control pipeline facilitates the effectiveness and reliability of the model training, leading to better performance.
- Score: 54.757324343062734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the current landscape of foundation model training, there is a significant
reliance on public domain data, which is nearing exhaustion according to recent
research. To further scale up, it is crucial to incorporate collaboration among
multiple specialized and high-quality private domain data sources. However, the
challenge of training models locally without sharing private data presents
numerous obstacles in data quality control. To tackle this issue, we propose a
data quality control pipeline for federated fine-tuning of foundation models.
This pipeline computes scores reflecting the quality of training data and
determines a global threshold for a unified standard, aiming for improved
global performance. Our experiments show that the proposed quality control
pipeline facilitates the effectiveness and reliability of the model training,
leading to better performance.
Related papers
- Synthetic Data Aided Federated Learning Using Foundation Models [4.666380225768727]
We propose Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models (DPSDA-FL)
Our experimental results have shown that DPSDA-FL can improve class recall and classification accuracy of the global model by up to 26% and 9%, respectively, in FL with Non-IID issues.
arXiv Detail & Related papers (2024-07-06T20:31:43Z) - An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets.
Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round.
We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - FedFN: Feature Normalization for Alleviating Data Heterogeneity Problem
in Federated Learning [29.626725039794383]
We introduce Federated Averaging with Feature Normalization Update (FedFN), a straightforward learning method.
We demonstrate the superior performance of FedFN through extensive experiments, even when applied to pretrained ResNet18.
arXiv Detail & Related papers (2023-11-22T09:37:33Z) - Leveraging Foundation Models to Improve Lightweight Clients in Federated
Learning [16.684749528240587]
Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data.
FL faces a significant challenge in the form of heterogeneous data distributions among clients, which leads to a reduction in performance and robustness.
We introduce foundation model distillation to assist in the federated training of lightweight client models and increase their performance under heterogeneous data settings while keeping inference costs low.
arXiv Detail & Related papers (2023-11-14T19:10:56Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Federated Multilingual Models for Medical Transcript Analysis [11.877236847857336]
We present a federated learning system for training a large-scale multi-lingual model.
None of the training data is ever transmitted to any central location.
We show that the global model performance can be further improved by a training step performed locally.
arXiv Detail & Related papers (2022-11-04T01:07:54Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.