Related papers: FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

URL: http://arxiv.org/abs/2506.02961v1
Date: Tue, 03 Jun 2025 14:54:12 GMT
Title: FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Authors: Yan Gao, Massimo Roberto Scamarcia, Javier Fernandez-Marques, Mohammad Naseri, Chong Shen Ng, Dimitris Stripelis, Zexi Li, Tao Shen, Jiamu Bai, Daoyuan Chen, Zikai Zhang, Rui Hu, InSeo Song, Lee KangYoon, Hong Jia, Ting Dang, Junyan Wang, Zheyuan Liu, Daniel Janes Beutel, Lingjuan Lyu, Nicholas D. Lane,
Abstract summary: Large Language Models (LLMs) have achieved state-of-the-art results across diverse domains, yet their development remains reliant on vast amounts of publicly available data.<n>This work lays the foundation for developing privacy-preserving, domain-specialized LLMs for real-world applications.
Score: 43.62847972139202
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have achieved state-of-the-art results across diverse domains, yet their development remains reliant on vast amounts of publicly available data, raising concerns about data scarcity and the lack of access to domain-specific, sensitive information. Federated Learning (FL) presents a compelling framework to address these challenges by enabling decentralized fine-tuning on pre-trained LLMs without sharing raw data. However, the compatibility and performance of pre-trained LLMs in FL settings remain largely under explored. We introduce the FlowerTune LLM Leaderboard, a first-of-its-kind benchmarking suite designed to evaluate federated fine-tuning of LLMs across four diverse domains: general NLP, finance, medical, and coding. Each domain includes federated instruction-tuning datasets and domain-specific evaluation metrics. Our results, obtained through a collaborative, open-source and community-driven approach, provide the first comprehensive comparison across 26 pre-trained LLMs with different aggregation and fine-tuning strategies under federated settings, offering actionable insights into model performance, resource constraints, and domain adaptation. This work lays the foundation for developing privacy-preserving, domain-specialized LLMs for real-world applications.

Related papers

Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration [13.824354003574843]
Bohdi is a synthetic-data-only heterogeneous Large Language Model (LLM) fusion framework.<n>By organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation.<n>Bohdi significantly outperforms existing baselines on multiple target LLMs.
arXiv Detail & Related papers (2025-06-04T17:01:38Z)
Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets.<n>This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z)
FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data [56.08867996209236]
Fine-tuning Multimodal Large Language Models (MLLMs) with Federated Learning (FL) allows for expanding the training data scope by including private data sources.<n>We introduce a benchmark to evaluate the performance of federated fine-tuning of MLLMs across various multimodal heterogeneous scenarios.<n>We develop a general FedMLLM framework that integrates classic FL methods alongside two modality-agnostic strategies.
arXiv Detail & Related papers (2024-11-22T04:09:23Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.<n>We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data [53.70870879858533]
We introduce a Federated Domain-specific Knowledge Transfer framework. It enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy. The proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5% with a privacy budget of less than 10.
arXiv Detail & Related papers (2024-05-23T06:14:35Z)
General LLMs as Instructors for Domain-Specific LLMs: A Sequential Fusion Method to Integrate Extraction and Editing [12.017822691367705]
We introduce a Sequential Fusion method to integrate knowledge from complex contexts into Large Language Models (LLMs) Using our method, domain-specific LLMs achieved a 71.7% accuracy (an average gain of 39.1%) in question-answering tasks. These findings underscore the effectiveness and flexibility of our approach in FDoR-UL across various domains.
arXiv Detail & Related papers (2024-03-23T06:03:36Z)
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning [44.200613313936024]
Large language models (LLMs) have demonstrated tremendous success across various fields. In this paper, we offer a potential next step for contemporary LLM training on the underutilized distributed private data via federated learning (FL) We build a concise, integrated, and research-friendly framework/codebase, named OpenFedLLM. It covers federated instruction tuning for enhancing instruction-following capability, federated value alignment for aligning with human values, and 7 representative FL algorithms.
arXiv Detail & Related papers (2024-02-10T13:50:11Z)
EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data [67.8302955948861]
Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable performance on various NLP tasks. Applying these models to specific domains still poses significant challenges, such as lack of domain knowledge. We focus on domain-specific continual pre-training of LLMs using E-commerce domain as an exemplar.
arXiv Detail & Related papers (2023-12-25T11:31:47Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
Integration of Large Language Models and Federated Learning [58.9876604258949]
We propose a research framework, dividing the fusion of LLMs and FL into three parts. We first provide a review of the current state of research in the domain of LLMs combined with FL, including their typical applications. We then discuss the practical applications of the combination of LLMs and FL in critical scenarios such as healthcare, finance, and education.
arXiv Detail & Related papers (2023-07-18T02:09:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.