Related papers: SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices

SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices

URL: http://arxiv.org/abs/2503.18986v1
Date: Sun, 23 Mar 2025 08:03:44 GMT
Title: SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices
Authors: Jian Ma, Xinchen Lyu, Jun Jiang, Qimei Cui, Haipeng Yao, Xiaofeng Tao,
Abstract summary: Fine-tuning large language models (LLMs) on private, on-device data can empower tailored personalized AI agents.<n>This paper proposes SplitFrozen, a split learning framework that enables efficient fine-tuning on resource-constrained edge devices.<n> Experiments on GPT-2 with the MRPC, MNLI-matched, and SST-2 datasets demonstrate that SplitFrozen outperforms FedLoRA and SplitLoRA by 69.4% model accuracy under extremely imbalanced data.
Score: 15.790762116995845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning large language models (LLMs) on private, on-device data can empower tailored personalized AI agents. However, fine-tuning LLMs on resource-constrained edge devices faces significant challenges, including excessive computation overhead, device heterogeneity, and data imbalance. This paper proposes SplitFrozen, a split learning framework that enables efficient LLM fine-tuning by strategically freezing device-side model layers while centralizing parameter-efficient fine-tuning on the server. Our framework partitions LLMs into device-side frozen layers and server-side fine-tuning layers, where heterogeneous resource-constrained devices execute only forward propagation. To minimize server-side training costs, we integrate Low-Rank Adaptation (LoRA) into the server-side layers. A pipeline parallelism strategy further optimizes training efficiency by decoupling device-server computations and leveraging decomposed backward propagation. Experiments on GPT-2 with the MRPC, MNLI-matched, and SST-2 datasets demonstrate that SplitFrozen outperforms FedLoRA and SplitLoRA by 69.4\% model accuracy under extremely imbalanced data, while reducing up to 86.8\% device-side computations and 50.2\% total training time. Experiments also validate the scalability of SplitFrozen on content generation task using Llama-3.2 model on GSM8K dataset.

Related papers

ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture [13.00865517063611]
We introduce ReinDSplit, a reinforcement learning framework that dynamically tailors split points for each device.<n>A Q-learning agent acts as an adaptive orchestrator, balancing workloads and latency thresholds across devices.<n>We evaluate ReinDSplit on three insect classification datasets using ResNet18, GoogleNet, and MobileNetV2.
arXiv Detail & Related papers (2025-06-16T19:18:56Z)
Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z)
HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models [30.345920952847752]
Large language models (LLMs) have achieved remarkable breakthroughs, revolutionizing the natural language processing domain and beyond.<n>Due to immense parameter sizes, fine-tuning these models with private data for diverse downstream tasks has become mainstream.<n>We propose HSplitLoRA, a framework built on split learning (SL) and low-rank adaptation (LoRA) fine-tuning, for efficiently fine-tuning LLMs on heterogeneous client devices.
arXiv Detail & Related papers (2025-05-05T17:09:19Z)
Efficient Deployment of Large Language Models on Resource-constrained Devices [12.644230479753476]
It is necessary to fine-tune Large Language Models (LLMs) on resource-constrained devices for various downstream tasks.<n>FedSpine is a framework that combines Efficient Fine-Tuning (PEFT) with structured pruning for efficient deployment of LLMs on resource-constrained devices.<n>We show that FedSpine can speed up fine-tuning by 1.4times$$$times and improve final accuracy by 0.4%-4.5% under the same sparsity level compared to other baselines.
arXiv Detail & Related papers (2025-01-05T04:38:11Z)
Federated Split Learning with Model Pruning and Gradient Quantization in Wireless Networks [7.439160287320074]
Federated split learning (FedSL) implements collaborative training across the edge devices and the server through model splitting. We propose a lightweight FedSL scheme, that further alleviates the training burden on resource-constrained edge devices. We conduct theoretical analysis to quantify the convergence performance of the proposed scheme.
arXiv Detail & Related papers (2024-12-09T11:43:03Z)
Split Federated Learning Over Heterogeneous Edge Devices: Algorithm and Optimization [7.013344179232109]
Split Learning (SL) is a promising collaborative machine learning approach, enabling resource-constrained devices to train models without sharing raw data. Current SL algorithms face limitations in training efficiency and suffer from prolonged latency. We propose the Heterogeneous Split Federated Learning framework, which allows resource-constrained clients to train their personalized client-side models in parallel.
arXiv Detail & Related papers (2024-11-21T07:46:01Z)
Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines [17.539008562641303]
Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers. Next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specific data. Fine-tuning on resource-constrained edge devices presents significant challenges due to substantial memory and computational demands.
arXiv Detail & Related papers (2024-09-23T20:14:09Z)
Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z)
Efficient Asynchronous Federated Learning with Sparsification and Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data. FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training. We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z)
Distributed Inference and Fine-tuning of Large Language Models Over The Internet [91.00270820533272]
Large language models (LLMs) are useful in many NLP tasks and become more capable with size. These models require high-end hardware, making them inaccessible to most researchers. We develop fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput.
arXiv Detail & Related papers (2023-12-13T18:52:49Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
Federated Split Learning with Only Positive Labels for resource-constrained IoT environment [4.055662817794178]
Distributed collaborative machine learning (DCML) is a promising method in the Internet of Things (IoT) domain for training deep learning models. We propose splitfed learning, known as splitfed learning (SFL), is the most suitable for efficient training and testing when devices have limited computational capabilities. We show that SFPL outperforms SFL when resource-constrained IoT devices have only positive labeled data.
arXiv Detail & Related papers (2023-07-25T05:33:06Z)
Adaptive Federated Pruning in Hierarchical Wireless Networks [69.6417645730093]
Federated Learning (FL) is a privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets. In this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale. We show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
arXiv Detail & Related papers (2023-05-15T22:04:49Z)
Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks [44.37047471448793]
In this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL) We propose an innovative PSL framework, namely, efficient parallel split learning (EPSL) to accelerate model training. We show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy.
arXiv Detail & Related papers (2023-03-26T16:09:48Z)
Predictive GAN-powered Multi-Objective Optimization for Hybrid Federated Split Learning [56.125720497163684]
We propose a hybrid federated split learning framework in wireless networks. We design a parallel computing scheme for model splitting without label sharing, and theoretically analyze the influence of the delayed gradient caused by the scheme on the convergence speed.
arXiv Detail & Related papers (2022-09-02T10:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.