Related papers: PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs

PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs

URL: http://arxiv.org/abs/2407.01031v1
Date: Mon, 1 Jul 2024 07:26:56 GMT
Title: PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs
Authors: Dan Peng, Zhihui Fu, Jun Wang,
Abstract summary: On mobile devices, the wealth of valuable, non-public data generated daily holds great promise for locally fine-tuning personalized LLMs. We propose employing derivative-free optimization techniques to enable on-device fine-tuning of LLM, even on memory-limited mobile devices. Empirical results demonstrate that the RoBERTa-large model and OPT-1.3B can be fine-tuned locally on the OPPO Reno 6 smartphone using around 4GB and 6.5GB of memory respectively.
Score: 5.063806958859058
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in large language models (LLMs) have indeed showcased their impressive capabilities. On mobile devices, the wealth of valuable, non-public data generated daily holds great promise for locally fine-tuning personalized LLMs, while maintaining privacy through on-device processing. However, the constraints of mobile device resources pose challenges to direct on-device LLM fine-tuning, mainly due to the memory-intensive nature of derivative-based optimization required for saving gradients and optimizer states. To tackle this, we propose employing derivative-free optimization techniques to enable on-device fine-tuning of LLM, even on memory-limited mobile devices. Empirical results demonstrate that the RoBERTa-large model and OPT-1.3B can be fine-tuned locally on the OPPO Reno 6 smartphone using around 4GB and 6.5GB of memory respectively, using derivative-free optimization techniques. This highlights the feasibility of on-device LLM fine-tuning on mobile devices, paving the way for personalized LLMs on resource-constrained devices while safeguarding data privacy.

Related papers

Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency [20.904706759529237]
Large Language Models (LLMs) are increasingly being integrated into various applications and services running on billions of mobile devices.<n>Currently, deploying LLMs on resource-limited mobile devices faces a significant challenge due to their high demand for computation, memory, and ultimately energy.
arXiv Detail & Related papers (2025-07-02T20:47:40Z)
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection [5.174560360759384]
Personalization of Large Language Models (LLMs) is important in practical applications to accommodate the individual needs of different mobile users. We present XPerT, a new technique that ensure proper selection of such already personalized LLMs based on explainability about how they were being fine-tuned. Experiment results show that XPerT reduces the costs of on-device LLM personalization by 83%, and improves its data efficiency by 51%.
arXiv Detail & Related papers (2025-04-15T17:38:06Z)
Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices [16.500721672193762]
Large Language Models (LLMs) can be adapted to specialized domains using private datasets stored on resource-constrained edge devices. We propose Prada, a privacy-preserving and efficient black-box LLM adaptation system using private on-device datasets. Prada achieves performance comparable to centralized fine-tuning methods while significantly reducing computational overhead by up to 60% and communication costs by up to 80%.
arXiv Detail & Related papers (2025-03-19T06:38:51Z)
Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices [5.926813659185372]
Small-size large language models (LLMs) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models. Only small-size LLMs can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models.
arXiv Detail & Related papers (2025-03-10T16:27:17Z)
MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning [45.49178219392948]
Large Language Model (LLM) fine-tuning at mobile devices poses great challenges due to extremely high memory requirements and slow training speeds. We propose MobiLLM to enable memory-efficient transformer LLM fine-tuning on a mobile device via server-assisted side-tuning.
arXiv Detail & Related papers (2025-02-27T07:58:02Z)
SlimLM: An Efficient Small Language Model for On-Device Document Assistance [60.971107009492606]
We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. SlimLM is pre-trained on SlimPajama-627B and fine-tuned on DocAssist. We evaluate SlimLM against existing SLMs, showing comparable or superior performance.
arXiv Detail & Related papers (2024-11-15T04:44:34Z)
Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference [20.666893617591136]
We propose Crayon, a novel approach for on-device LLM customization. We develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server.
arXiv Detail & Related papers (2024-06-11T07:00:08Z)
Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity [66.67596152389591]
Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models. In this study, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO. Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM with ZO can outperform the full ZO fine-tuning performance.
arXiv Detail & Related papers (2024-06-05T04:07:35Z)
On the Compressibility of Quantized Large Language Models [13.443384050034922]
Large Language Models (LLMs) are deployed on edge or mobile devices to offer enhanced data privacy and real-time processing capabilities. LLMs may still be too big to fit entirely into the limited memory of edge or mobile devices and have to be partially loaded from the storage to complete the inference. We study applying data compression techniques to reduce data movement and thus speed up the inference of quantized LLM on memory-constrained devices.
arXiv Detail & Related papers (2024-03-03T03:27:07Z)
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z)
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z)
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN) At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
PrivateLoRA For Efficient Privacy Preserving LLM [20.750808913757396]
We propose a novel Large Language Model (LLM) service paradigm that distributes privacy-sensitive computation on edge devices and shared in the cloud. Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations. Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models.
arXiv Detail & Related papers (2023-11-23T14:36:30Z)
Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training [18.526329975259483]
Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. It is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. We propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices.
arXiv Detail & Related papers (2023-11-22T13:20:59Z)
Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile [0.0]
This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity. The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory. Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions features.
arXiv Detail & Related papers (2023-09-29T16:30:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.