Related papers: PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning

PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning

URL: http://arxiv.org/abs/2507.01216v1
Date: Tue, 01 Jul 2025 22:27:21 GMT
Title: PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning
Authors: Xingke Yang, Liang Li, Zhiyi Wan, Sicong Li, Hao Wang, Xiaoqi Qi, Jiang Liu, Tomoaki Ohtsuki, Xin Fu, Miao Pan,
Abstract summary: PAE MobiLLM is a privacy-aware and efficient LLM FT method which can be deployed on the mobile device via server-assisted additive side-tuning.<n>It integrates activation caching on the server side, which allows the server to reuse historical activations and saves the mobile device from repeatedly computing forward passes for the recurring data samples.<n>Finally, PAE MobiLLM introduces the additive adapter side-network design which makes the server train the adapter modules based on device-defined prediction differences.
Score: 23.15414219447242
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is a huge gap between numerous intriguing applications fostered by on-device large language model (LLM) fine-tuning (FT) from fresh mobile data and the limited resources of a mobile device. While existing server-assisted methods (e.g., split learning or side-tuning) may enable LLM FT on the local mobile device, they suffer from heavy communication burdens of activation transmissions, and may disclose data, labels or fine-tuned models to the server. To address those issues, we develop PAE MobiLLM, a privacy-aware and efficient LLM FT method which can be deployed on the mobile device via server-assisted additive side-tuning. To further accelerate FT convergence and improve computing efficiency, PAE MobiLLM integrates activation caching on the server side, which allows the server to reuse historical activations and saves the mobile device from repeatedly computing forward passes for the recurring data samples. Besides, to reduce communication cost, PAE MobiLLM develops a one-token (i.e., ``pivot'' token) activation shortcut that transmits only a single activation dimension instead of full activation matrices to guide the side network tuning. Last but not least, PAE MobiLLM introduces the additive adapter side-network design which makes the server train the adapter modules based on device-defined prediction differences rather than raw ground-truth labels. In this way, the server can only assist device-defined side-network computing, and learn nothing about data, labels or fine-tuned models.

Related papers

MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning [45.49178219392948]
Large Language Model (LLM) fine-tuning at mobile devices poses great challenges due to extremely high memory requirements and slow training speeds.<n>We propose MobiLLM to enable memory-efficient transformer LLM fine-tuning on a mobile device via server-assisted side-tuning.
arXiv Detail & Related papers (2025-02-27T07:58:02Z)
WDMoE: Wireless Distributed Mixture of Experts for Large Language Models [68.45482959423323]
Large Language Models (LLMs) have achieved significant success in various natural language processing tasks. We propose a wireless distributed Mixture of Experts (WDMoE) architecture to enable collaborative deployment of LLMs across edge servers at the base station (BS) and mobile devices in wireless networks.
arXiv Detail & Related papers (2024-11-11T02:48:00Z)
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms [11.87161637895978]
We introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate large language models on mobile devices.<n>We provide a benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities.
arXiv Detail & Related papers (2024-10-05T03:37:07Z)
FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model [48.33280660752336]
Large language models (LLMs) show amazing performance on many domain-specific tasks after fine-tuning with some appropriate data. Many domain-specific data are privately distributed across multiple owners. We introduce FedBiOT, a resource-efficient LLM fine-tuning approach to federated learning.
arXiv Detail & Related papers (2024-06-25T16:45:47Z)
When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment [100.58938424441027]
We propose a split learning system for AI agents in 6G networks leveraging the collaboration between mobile devices and edge servers. We introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context.
arXiv Detail & Related papers (2024-01-15T15:20:59Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training [18.526329975259483]
Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. It is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. We propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices.
arXiv Detail & Related papers (2023-11-22T13:20:59Z)
Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile [0.0]
This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity. The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory. Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions features.
arXiv Detail & Related papers (2023-09-29T16:30:49Z)
FwdLLM: Efficient FedLLM using Forward Gradient [8.520892692833293]
This work introduces FwdLLM, an innovative FL protocol designed to enhance the FedLLM efficiency. FwdLLM employs backpropagation (BP)-free training methods, requiring devices only to execute perturbed inferences''
arXiv Detail & Related papers (2023-08-26T14:36:30Z)
Accelerating Asynchronous Federated Learning Convergence via Opportunistic Mobile Relaying [3.802258033231335]
We study the impact of mobility on the convergence performance of asynchronous Federated Learning (FL) algorithms. By exploiting mobility, the study shows that clients can indirectly communicate with the server through another client serving as a relay. We propose a new FL algorithm, called FedMobile, that incorporates opportunistic relaying and addresses key questions such as when and how to relay.
arXiv Detail & Related papers (2022-06-09T19:23:20Z)
Joint Superposition Coding and Training for Federated Learning over Multi-Width Neural Networks [52.93232352968347]
This paper aims to integrate two synergetic technologies, federated learning (FL) and width-adjustable slimmable neural network (SNN) FL preserves data privacy by exchanging the locally trained models of mobile devices. SNNs are however non-trivial, particularly under wireless connections with time-varying channel conditions. We propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models.
arXiv Detail & Related papers (2021-12-05T11:17:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.