FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models
- URL: http://arxiv.org/abs/2406.04845v1
- Date: Fri, 7 Jun 2024 11:19:30 GMT
- Title: FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models
- Authors: Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen,
- Abstract summary: Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM)
There are currently no realistic datasets and benchmarks for FedLLM.
We propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics.
- Score: 48.484485609995986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM). Following this training paradigm, the community has put massive efforts from diverse aspects including framework, performance, and privacy. However, an unpleasant fact is that there are currently no realistic datasets and benchmarks for FedLLM and previous works all rely on artificially constructed datasets, failing to capture properties in real-world scenarios. Addressing this, we propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics, to offer a comprehensive testbed for the FedLLM community. FedLLM-Bench encompasses three datasets (e.g., user-annotated multilingual dataset) for federated instruction tuning and one dataset (e.g., user-annotated preference dataset) for federated preference alignment, whose scale of client number ranges from 38 to 747. Our datasets incorporate several representative diversities: language, quality, quantity, instruction, length, embedding, and preference, capturing properties in real-world scenarios. Based on FedLLM-Bench, we conduct experiments on all datasets to benchmark existing FL methods and provide empirical insights (e.g., multilingual collaboration). We believe that our FedLLM-Bench can benefit the FedLLM community by reducing required efforts, providing a practical testbed, and promoting fair comparisons. Code and datasets are available at https://github.com/rui-ye/FedLLM-Bench.
Related papers
- LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data [3.66486428341988]
Multimodal Deep Learning enhances decision-making by integrating diverse information sources, such as texts, images, audio, and videos.
To develop trustworthy multimodal approaches, it is essential to understand how uncertainty impacts these models.
We propose LUMA, a unique benchmark dataset, featuring audio, image, and textual data from 50 classes, for learning from uncertain and multimodal data.
arXiv Detail & Related papers (2024-06-14T09:22:07Z) - A Universal Metric of Dataset Similarity for Cross-silo Federated Learning [0.0]
Federated learning is increasingly used in domains such as healthcare to facilitate model training without data-sharing.
In this paper, we propose a novel metric for assessing dataset similarity.
We show that our metric shows a robust and interpretable relationship with model performance and can be calculated in privacy-preserving manner.
arXiv Detail & Related papers (2024-04-29T15:08:24Z) - OpenFedLLM: Training Large Language Models on Decentralized Private Data
via Federated Learning [44.200613313936024]
Large language models (LLMs) have demonstrated tremendous success across various fields.
In this paper, we offer a potential next step for contemporary LLM training on the underutilized distributed private data via federated learning (FL)
We build a concise, integrated, and research-friendly framework/codebase, named OpenFedLLM.
It covers federated instruction tuning for enhancing instruction-following capability, federated value alignment for aligning with human values, and 7 representative FL algorithms.
arXiv Detail & Related papers (2024-02-10T13:50:11Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - DataComp: In search of the next generation of multimodal datasets [179.79323076587255]
DataComp is a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl.
Our benchmark consists of multiple compute scales spanning four orders of magnitude.
In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet.
arXiv Detail & Related papers (2023-04-27T11:37:18Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in
Realistic Healthcare Settings [51.09574369310246]
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models.
We propose a novel cross-silo dataset suite focused on healthcare, FLamby, to bridge the gap between theory and practice of cross-silo FL.
Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research.
arXiv Detail & Related papers (2022-10-10T12:17:30Z) - FedScale: Benchmarking Model and System Performance of Federated
Learning [4.1617240682257925]
FedScale is a set of challenging and realistic benchmark datasets for federated learning (FL) research.
FedScale is open-source with permissive licenses and actively maintained.
arXiv Detail & Related papers (2021-05-24T15:55:27Z) - Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client.
Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation.
This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.