Related papers: On the Convergence of Zeroth-Order Federated Tuning for Large Language Models

On the Convergence of Zeroth-Order Federated Tuning for Large Language Models

URL: http://arxiv.org/abs/2402.05926v3
Date: Mon, 17 Jun 2024 16:00:59 GMT
Title: On the Convergence of Zeroth-Order Federated Tuning for Large Language Models
Authors: Zhenqing Ling, Daoyuan Chen, Liuyi Yao, Yaliang Li, Ying Shen,
Abstract summary: Federated Learning and Large Language Models (LLMs) are ushering in a new era in privacy-preserving natural language processing. Memory-efficient Zeroth-Order Optimization is a synergy we term as FedMeZO. Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs.
Score: 36.277423093218275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The confluence of Federated Learning (FL) and Large Language Models (LLMs) is ushering in a new era in privacy-preserving natural language processing. However, the intensive memory requirements for fine-tuning LLMs pose significant challenges, especially when deploying on clients with limited computational resources. To circumvent this, we explore the novel integration of Memory-efficient Zeroth-Order Optimization within a federated setting, a synergy we term as FedMeZO. Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs, tackling key questions regarding the influence of large parameter spaces on optimization behavior, the establishment of convergence properties, and the identification of critical parameters for convergence to inform personalized federated strategies. Our extensive empirical evidence supports the theory, showing that FedMeZO not only converges faster than traditional first-order methods such as FedAvg but also significantly reduces GPU memory usage during training to levels comparable to those during inference. Moreover, the proposed personalized FL strategy that is built upon the theoretical insights to customize the client-wise learning rate can effectively accelerate loss reduction. We hope our work can help to bridge theoretical and practical aspects of federated fine-tuning for LLMs, thereby stimulating further advancements and research in this area.

Related papers

Understanding LoRA as Knowledge Memory: An Empirical Analysis [20.53732426953178]
This work investigates a parametric approach using Low-Rank Adaptation (LoRA) as a modular knowledge memory.<n>We bridge this gap through the first systematic empirical study mapping the design space of LoRA-based memory.<n>Our findings position LoRA as the complementary axis of memory alongside RAG and ICL, offering distinct advantages.
arXiv Detail & Related papers (2026-03-01T13:28:57Z)
FedKRSO: Communication and Memory Efficient Federated Fine-Tuning of Large Language Models [14.208669882584482]
Fine-tuning of large language models (LLMs) is essential to adapt them to domain-specific tasks.<n> Federated Learning (FL) is gaining popularity in FL fine-tuning, but remains challenging due to the high cost of transmitting full model parameters.<n>This paper proposes FedKRSO, a novel method that enables communication and memory efficient FFT of LLMs in federated settings.
arXiv Detail & Related papers (2026-02-03T02:39:33Z)
In-Context Probing for Membership Inference in Fine-Tuned Language Models [14.590625376049955]
Membership inference attacks (MIAs) pose a critical privacy threat to fine-tuned large language models (LLMs)<n>We propose ICP-MIA, a novel MIA framework grounded in the theory of training dynamics.<n>ICP-MIA significantly outperforms prior black-box MIAs, particularly at low false positive rates.
arXiv Detail & Related papers (2025-12-18T08:26:26Z)
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks [63.541114376141735]
Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios.<n>However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks.<n>We propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism.
arXiv Detail & Related papers (2025-11-04T15:14:58Z)
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z)
Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets. This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z)
Exploring Gradient Subspaces: Addressing and Overcoming LoRA's Limitations in Federated Fine-Tuning of Large Language Models [19.533062623518674]
This paper critically analyzes the convergence and performance guarantees of popular FL frameworks utilizing Low-Rank Adaptation (LoRA) We demonstrate that direct weight averaging outperforms LoRA-based strategies, leading to superior performance for fine-tuned models. Our findings show that GaLore along with direct-weight aggregation is a more effective approach, outperforming federated LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities.
arXiv Detail & Related papers (2024-10-30T15:23:44Z)
Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Information [6.767885381740953]
Federated learning has emerged as a distributed optimization paradigm. We propose a novel modified framework wherein each client locally performs a perturbed gradient step. We show that our algorithm speeds convergence up to a margin of 30 global rounds compared with FedAvg.
arXiv Detail & Related papers (2024-10-07T23:14:05Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z)
Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large language models (LLMs) have driven profound transformations in wireless networks.<n>Within wireless environments, the training of LLMs faces significant challenges related to security and privacy.<n>This paper presents a systematic analysis of the training stages of LLMs in wireless networks, including pre-training, instruction tuning, and alignment tuning.
arXiv Detail & Related papers (2024-04-20T02:30:21Z)
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
Fed-LAMB: Layerwise and Dimensionwise Locally Adaptive Optimization Algorithm [24.42828071396353]
In the emerging paradigm of federated learning (FL), large amount of clients, such as mobile devices, are used to train on their respective data. Due to the low bandwidth, decentralized optimization methods need to shift the computation burden from those clients to those servers. We present Fed-LAMB, a novel learning method based on a layerwise, deep neural networks.
arXiv Detail & Related papers (2021-10-01T16:54:31Z)
Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.