LSRP: A Leader-Subordinate Retrieval Framework for Privacy-Preserving Cloud-Device Collaboration
- URL: http://arxiv.org/abs/2505.05031v2
- Date: Tue, 03 Jun 2025 02:55:09 GMT
- Title: LSRP: A Leader-Subordinate Retrieval Framework for Privacy-Preserving Cloud-Device Collaboration
- Authors: Yingyi Zhang, Pengyue Jia, Xianneng Li, Derong Xu, Maolin Wang, Yichao Wang, Zhaocheng Du, Huifeng Guo, Yong Liu, Ruiming Tang, Xiangyu Zhao,
- Abstract summary: Cloud-device collaboration leverages on-cloud Large Language Models (LLMs) for handling public user queries and on-device Small Language Models (SLMs) for processing private user data.<n>Existing approaches often fail to fully leverage the scalable problem-solving capabilities of on-cloud LLMs.<n>We propose a Leader-Subordinate Retrieval framework for Privacy-preserving cloud-device collaboration (LSRP)
- Score: 43.115594451678255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cloud-device collaboration leverages on-cloud Large Language Models (LLMs) for handling public user queries and on-device Small Language Models (SLMs) for processing private user data, collectively forming a powerful and privacy-preserving solution. However, existing approaches often fail to fully leverage the scalable problem-solving capabilities of on-cloud LLMs while underutilizing the advantage of on-device SLMs in accessing and processing personalized data. This leads to two interconnected issues: 1) Limited utilization of the problem-solving capabilities of on-cloud LLMs, which fail to align with personalized user-task needs, and 2) Inadequate integration of user data into on-device SLM responses, resulting in mismatches in contextual user information. In this paper, we propose a Leader-Subordinate Retrieval framework for Privacy-preserving cloud-device collaboration (LSRP), a novel solution that bridges these gaps by: 1) enhancing on-cloud LLM guidance to on-device SLM through a dynamic selection of task-specific leader strategies named as user-to-user retrieval-augmented generation (U-U-RAG), and 2) integrating the data advantages of on-device SLMs through small model feedback Direct Preference Optimization (SMFB-DPO) for aligning the on-cloud LLM with the on-device SLM. Experiments on two datasets demonstrate that LSRP consistently outperforms state-of-the-art baselines, significantly improving question-answer relevance and personalization, while preserving user privacy through efficient on-device retrieval. Our code is available at: https://github.com/Applied-Machine-Learning-Lab/LSRP.
Related papers
- CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering [68.91862701376155]
CoSteer is a novel collaborative framework that enables decoding-time personalization through localized delta steering.<n>We formulate token-level optimization as an online learning problem, where local delta vectors dynamically adjust the remote LLM's logits.<n>This approach preserves privacy by transmitting only the final steered tokens rather than raw data or intermediate vectors.
arXiv Detail & Related papers (2025-07-07T08:32:29Z) - A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation [0.6699777383856287]
ChatGPT services leverage cloud-based large language models (LLMs)<n>Privacy concerns arise as prompts are transmitted and processed by the model providers.<n>We propose a general pseudonymization framework applicable to cloud-based LLMs.
arXiv Detail & Related papers (2025-02-21T06:15:53Z) - PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models [10.050972891318324]
We propose a privacy preservation pipeline for protecting privacy and sensitive information during interactions between users and large language models.<n>We construct SensitiveQA, the first privacy open-ended question-answering dataset.<n>Our proposed solution employs a multi-stage strategy aimed at preemptively securing user information while simultaneously preserving the response quality of cloud-based LLMs.
arXiv Detail & Related papers (2025-02-19T09:17:07Z) - Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory [11.83473842859642]
This work is the first to identify model inversion attacks in the split learning framework for personalized LLMs.<n>We propose a two-stage attack system in which the first part projects representations into the embedding space, and the second part uses a generative model to recover text from these embeddings.
arXiv Detail & Related papers (2025-01-10T13:47:13Z) - Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud Recommendation [47.28027985634746]
Large Language Models for Recommendation (LLMs) for Recommendation (LLM4Rec) is a promising research direction that has demonstrated exceptional performance in this field.<n>LLMs are costly to train and infer frequently, and struggle to access real-time data.<n>Small recommendation models (SRMs) can effectively supplement these shortcomings by consuming minimal resources for frequent training and inference, and by conveniently accessing real-time data on devices.
arXiv Detail & Related papers (2025-01-10T01:27:12Z) - Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding [61.45448947483328]
We introduce Lossless Acceleration via Speculative Decoding for LLM-based Recommender Systems (LASER)<n>LASER features a Customized Retrieval Pool to enhance retrieval efficiency and Relaxed Verification to improve the acceptance rate of draft tokens.<n>LASER achieves a 3-5x speedup on public datasets and saves about 67% of computational resources during the online A/B test.
arXiv Detail & Related papers (2024-08-11T02:31:13Z) - Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference [20.666893617591136]
We propose Crayon, a novel approach for on-device LLM customization.
We develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server.
arXiv Detail & Related papers (2024-06-11T07:00:08Z) - A Federated Framework for LLM-based Recommendation [65.12855401912948]
Large Language Models (LLMs) have empowered generative recommendation systems through fine-tuning user behavior data.<n> utilizing the user data may pose significant privacy risks, potentially leading to ethical dilemmas and violations of data protection regulations.<n>To address the privacy concerns, Federated Learning for Recommendation (Fed4Rec) has been identified as a promising solution.
arXiv Detail & Related papers (2024-02-15T14:09:28Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.