CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
- URL: http://arxiv.org/abs/2507.04756v2
- Date: Mon, 29 Sep 2025 09:12:58 GMT
- Title: CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
- Authors: Hang Lv, Sheng Liang, Hao Wang, Hongchao Gu, Yaxiong Wu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen,
- Abstract summary: CoSteer is a novel collaborative framework that enables decoding-time personalization through localized delta steering.<n>We formulate token-level optimization as an online learning problem, where local delta vectors dynamically adjust the remote LLM's logits.<n>This approach preserves privacy by transmitting only the final steered tokens rather than raw data or intermediate vectors.
- Score: 80.54309860395763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized text generation has become crucial for adapting language models to diverse and evolving users' personal context across cultural, temporal, and contextual dimensions. While existing methods often rely on centralized fine-tuning or static preference alignment, they struggle to achieve real-time adaptation under resource constraints inherent to personal devices. This limitation creates a dilemma: large cloud-based models lack access to localized user-specific information, while small on-device models cannot match the generation quality of their cloud counterparts. To address this dichotomy, we present CoSteer, a novel collaborative framework that enables decoding-time personalization through localized delta steering. Our key insight lies in leveraging the logits difference between personal context-aware and -agnostic outputs from local small models as steering signals for cloud-based LLMs. Specifically, we formulate token-level optimization as an online learning problem, where local delta vectors dynamically adjust the remote LLM's logits within the on-device environment. This approach preserves privacy by transmitting only the final steered tokens rather than raw data or intermediate vectors, while maintaining cloud-based LLMs' general capabilities without fine-tuning. Through comprehensive experiments on various personalized generation tasks, we demonstrate that CoSteer effectively assists LLMs in generating personalized content by leveraging locally stored user profiles and histories, ensuring privacy preservation through on-device data processing while maintaining acceptable computational overhead.
Related papers
- Floe: Federated Specialization for Real-Time LLM-SLM Inference [32.782914689403746]
Floe is a hybrid federated learning framework designed for latency-sensitive, resource-constrained environments.<n>Personal data and fine-tuning remain on-device, while the cloud LLM contributes general knowledge without exposing proprietary weights.<n>Floe significantly improves model performance and reduces inference latency on edge devices under real-time constraints.
arXiv Detail & Related papers (2026-02-15T20:28:38Z) - Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration [8.776463501718737]
We propose a context-aware framework that dynamically balances privacy and inference quality.<n>PRISM executes in four stages: (1) the edge device profiles entity-level sensitivity; (2) a soft gating module on the edge selects an execution mode - cloud, edge, or collaboration; (3) for collaborative paths, the edge applies adaptive two-layer local differential privacy based on entity risks; and (4) the cloud LLM generates a semantic sketch from the perturbed prompt.
arXiv Detail & Related papers (2025-11-27T22:32:33Z) - Personalized Vision via Visual In-Context Learning [62.85784251383279]
We present a visual in-context learning framework for personalized vision.<n>PICO infers the underlying transformation and applies it to new inputs without retraining.<n>We also propose an attention-guided seed scorer that improves reliability via efficient inference scaling.
arXiv Detail & Related papers (2025-09-29T17:58:45Z) - Cloud-Device Collaborative Agents for Sequential Recommendation [36.05863003744828]
Large language models (LLMs) have enabled agent-based recommendation systems with strong semantic understanding and flexible reasoning capabilities.<n>LLMs offer powerful personalization, but they often suffer from privacy concerns, limited access to real-time signals, and scalability bottlenecks.<n>We propose a novel Cloud-Device collaborative framework for sequential Recommendation, powered by dual agents.
arXiv Detail & Related papers (2025-09-01T15:28:11Z) - Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model [43.13807038270687]
CDCDA-PLM is a framework for deploying personalized on-device language models on user devices with support from a powerful cloud-based LLM.<n>Using both real and synthetic data, A personalized on-device language models (LMs) is fine-tuned via parameter-efficient fine-tuning (PEFT) modules.
arXiv Detail & Related papers (2025-08-29T02:33:13Z) - P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices [12.821321451464081]
Split Learning (SL) enables resource constrained edge devices to participate in model training by partitioning a model into client-side and server-side sub-models.<n>SL encounters significant challenges in heterogeneous environments where devices vary in computing resources, communication capabilities, environmental conditions, and privacy requirements.<n>We propose P3SL, a Personalized Privacy-Preserving Split Learning framework designed for heterogeneous, resource-constrained edge device systems.
arXiv Detail & Related papers (2025-07-23T05:50:33Z) - DPO Learning with LLMs-Judge Signal for Computer Use Agents [9.454381108993832]
Computer use agents (CUA) are systems that automatically interact with graphical user interfaces (GUIs) to complete tasks.<n>We develop a lightweight vision-language model that runs entirely on local machines.
arXiv Detail & Related papers (2025-06-03T17:27:04Z) - LSRP: A Leader-Subordinate Retrieval Framework for Privacy-Preserving Cloud-Device Collaboration [43.115594451678255]
Cloud-device collaboration leverages on-cloud Large Language Models (LLMs) for handling public user queries and on-device Small Language Models (SLMs) for processing private user data.<n>Existing approaches often fail to fully leverage the scalable problem-solving capabilities of on-cloud LLMs.<n>We propose a Leader-Subordinate Retrieval framework for Privacy-preserving cloud-device collaboration (LSRP)
arXiv Detail & Related papers (2025-05-08T08:06:34Z) - Personalized Language Models via Privacy-Preserving Evolutionary Model Merging [53.97323896430374]
Personalization in language models aims to tailor model behavior to individual users or user groups.<n>We propose Privacy-Preserving Model Merging via Evolutionary Algorithms (PriME)<n>PriME employs gradient-free methods to directly optimize utility while reducing privacy risks.<n>Experiments on the LaMP benchmark show that PriME consistently outperforms a range of baselines, achieving up to a 45% improvement in task performance.
arXiv Detail & Related papers (2025-03-23T09:46:07Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Modality Alignment Meets Federated Broadcasting [9.752555511824593]
Federated learning (FL) has emerged as a powerful approach to safeguard data privacy by training models across distributed edge devices without centralizing local data.
This paper introduces a novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices.
arXiv Detail & Related papers (2024-11-24T13:30:03Z) - Personalized Federated Learning for Cross-view Geo-localization [49.40531019551957]
We propose a methodology combining Federated Learning (FL) with Cross-view Image Geo-localization (CVGL) techniques.
Our method implements a coarse-to-fine approach, where clients share only the coarse feature extractors while keeping fine-grained features specific to local environments.
Results demonstrate that our federated CVGL method achieves performance close to centralized training while maintaining data privacy.
arXiv Detail & Related papers (2024-11-07T13:25:52Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - Collaborative Chinese Text Recognition with Personalized Federated
Learning [61.34060587461462]
In Chinese text recognition, it is often necessary for one organization to collect a large amount of data from similar organizations.
Due to the natural presence of private information in text data, such as addresses and phone numbers, different organizations are unwilling to share private data.
We introduce personalized federated learning (pFL) into the Chinese text recognition task and propose the pFedCR algorithm.
arXiv Detail & Related papers (2023-05-09T16:51:00Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Unsupervised Model Personalization while Preserving Privacy and
Scalability: An Open Problem [55.21502268698577]
This work investigates the task of unsupervised model personalization, adapted to continually evolving, unlabeled local user images.
We provide a novel Dual User-Adaptation framework (DUA) to explore the problem.
This framework flexibly disentangles user-adaptation into model personalization on the server and local data regularization on the user device.
arXiv Detail & Related papers (2020-03-30T09:35:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.