Related papers: Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model

Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model

URL: http://arxiv.org/abs/2508.21313v1
Date: Fri, 29 Aug 2025 02:33:13 GMT
Title: Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model
Authors: Zhaofeng Zhong, Wei Yuan, Liang Qu, Tong Chen, Hao Wang, Xiangyu Zhao, Hongzhi Yin,
Abstract summary: CDCDA-PLM is a framework for deploying personalized on-device language models on user devices with support from a powerful cloud-based LLM.<n>Using both real and synthetic data, A personalized on-device language models (LMs) is fine-tuned via parameter-efficient fine-tuning (PEFT) modules.
Score: 43.13807038270687
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the advancement of large language models (LLMs), significant progress has been achieved in various Natural Language Processing (NLP) tasks. However, existing LLMs still face two major challenges that hinder their broader adoption: (1) their responses tend to be generic and lack personalization tailored to individual users, and (2) they rely heavily on cloud infrastructure due to intensive computational requirements, leading to stable network dependency and response delay. Recent research has predominantly focused on either developing cloud-based personalized LLMs or exploring the on-device deployment of general-purpose LLMs. However, few studies have addressed both limitations simultaneously by investigating personalized on-device language models. To bridge this gap, we propose CDCDA-PLM, a framework for deploying personalized on-device language models on user devices with support from a powerful cloud-based LLM. Specifically, CDCDA-PLM leverages the server-side LLM's strong generalization capabilities to augment users' limited personal data, mitigating the issue of data scarcity. Using both real and synthetic data, A personalized on-device language models (LMs) is fine-tuned via parameter-efficient fine-tuning (PEFT) modules and deployed on users' local devices, enabling them to process queries without depending on cloud-based LLMs. This approach eliminates reliance on network stability and ensures high response speeds. Experimental results across six tasks in a widely used personalization benchmark demonstrate the effectiveness of CDCDA-PLM.

Related papers

Synera: Synergistic LLM Serving across Device and Cloud at Scale [8.533983798094683]
Large Language Models (LLMs) are becoming key components in various mobile operating systems.<n>Their deployment suffers from a set of performance challenges, especially the generation quality and prolonged latency degradation.<n>This paper proposes Synera, a device-cloud synergistic LLM serving system that applies an efficient SLM-LLM synergistic mechanism.
arXiv Detail & Related papers (2025-10-17T04:31:50Z)
Cloud-Device Collaborative Agents for Sequential Recommendation [36.05863003744828]
Large language models (LLMs) have enabled agent-based recommendation systems with strong semantic understanding and flexible reasoning capabilities.<n>LLMs offer powerful personalization, but they often suffer from privacy concerns, limited access to real-time signals, and scalability bottlenecks.<n>We propose a novel Cloud-Device collaborative framework for sequential Recommendation, powered by dual agents.
arXiv Detail & Related papers (2025-09-01T15:28:11Z)
CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering [68.91862701376155]
CoSteer is a novel collaborative framework that enables decoding-time personalization through localized delta steering.<n>We formulate token-level optimization as an online learning problem, where local delta vectors dynamically adjust the remote LLM's logits.<n>This approach preserves privacy by transmitting only the final steered tokens rather than raw data or intermediate vectors.
arXiv Detail & Related papers (2025-07-07T08:32:29Z)
LSRP: A Leader-Subordinate Retrieval Framework for Privacy-Preserving Cloud-Device Collaboration [43.115594451678255]
Cloud-device collaboration leverages on-cloud Large Language Models (LLMs) for handling public user queries and on-device Small Language Models (SLMs) for processing private user data.<n>Existing approaches often fail to fully leverage the scalable problem-solving capabilities of on-cloud LLMs.<n>We propose a Leader-Subordinate Retrieval framework for Privacy-preserving cloud-device collaboration (LSRP)
arXiv Detail & Related papers (2025-05-08T08:06:34Z)
A New Paradigm of User-Centric Wireless Communication Driven by Large Language Models [53.16213723669751]
Next generation of wireless communications seeks to deeply integrate artificial intelligence with user-centric communication networks.<n>We propose a novel paradigm for wireless communication that innovatively incorporates the nature language to structured query language.<n>We present a prototype system in which a dynamic semantic representation network at the physical layer adapts its encoding depth to meet user requirements.
arXiv Detail & Related papers (2025-04-16T01:43:36Z)
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints.<n>PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint.<n> evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z)
Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices [5.926813659185372]
Small-size large language models (LLMs) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models.<n>Only small-size LLMs can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models.
arXiv Detail & Related papers (2025-03-10T16:27:17Z)
ELMS: Elasticized Large Language Models On Mobile Devices [5.689405542579458]
On-device Large Language Models (LLMs) are revolutionizing mobile AI, enabling applications such as UI automation while addressing privacy concerns. We introduce ELMS, an on-device LLM service designed to provide elasticity in both the model and prompt dimensions. A one-time reorder neuroning technique, which utilizes the inherent permutation consistency within transformer models to create high-quality, elastic sub-models. A dual-head compact language model, which efficiently refines prompts and coordinates the elastic adaptation between the model prompt.
arXiv Detail & Related papers (2024-09-08T06:32:08Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization [66.27399823422665]
Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications.<n>We propose an efficient Device-cloUd collaborative parametErs generaTion framework DUET.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.