Related papers: Cloud-Device Collaborative Agents for Sequential Recommendation

Cloud-Device Collaborative Agents for Sequential Recommendation

URL: http://arxiv.org/abs/2509.01551v1
Date: Mon, 01 Sep 2025 15:28:11 GMT
Title: Cloud-Device Collaborative Agents for Sequential Recommendation
Authors: Jing Long, Sirui Huang, Huan Huo, Tong Chen, Hongzhi Yin, Guandong Xu,
Abstract summary: Large language models (LLMs) have enabled agent-based recommendation systems with strong semantic understanding and flexible reasoning capabilities.<n>LLMs offer powerful personalization, but they often suffer from privacy concerns, limited access to real-time signals, and scalability bottlenecks.<n>We propose a novel Cloud-Device collaborative framework for sequential Recommendation, powered by dual agents.
Score: 36.05863003744828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) have enabled agent-based recommendation systems with strong semantic understanding and flexible reasoning capabilities. While LLM-based agents deployed in the cloud offer powerful personalization, they often suffer from privacy concerns, limited access to real-time signals, and scalability bottlenecks. Conversely, on-device agents ensure privacy and responsiveness but lack the computational power for global modeling and large-scale retrieval. To bridge these complementary limitations, we propose CDA4Rec, a novel Cloud-Device collaborative framework for sequential Recommendation, powered by dual agents: a cloud-side LLM and a device-side small language model (SLM). CDA4Rec tackles the core challenge of cloud-device coordination by decomposing the recommendation task into modular sub-tasks including semantic modeling, candidate retrieval, structured user modeling, and final ranking, which are allocated to cloud or device based on computational demands and privacy sensitivity. A strategy planning mechanism leverages the cloud agent's reasoning ability to generate personalized execution plans, enabling context-aware task assignment and partial parallel execution across agents. This design ensures real-time responsiveness, improved efficiency, and fine-grained personalization, even under diverse user states and behavioral sparsity. Extensive experiments across multiple real-world datasets demonstrate that CDA4Rec consistently outperforms competitive baselines in both accuracy and efficiency, validating its effectiveness in heterogeneous and resource-constrained environments.

Related papers

Self-Evolving Multi-Agent Network for Industrial IoT Predictive Maintenance [5.571627005866756]
Industrial IoT predictive maintenance requires systems capable of real-time anomaly detection without sacrificing interpretability or demanding excessive computational resources.<n>Traditional approaches rely on static, offline-trained models that cannot adapt to evolving operational conditions.<n>We introduce SEMAS, a self-evolving hierarchical multi-agent system that distributes specialized agents across Edge, Fog, and Cloud computational tiers.
arXiv Detail & Related papers (2026-02-17T22:45:43Z)
ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks [62.031889234230725]
6G networks rely on complex cross-layer optimization.<n> manually translating high-level intents into mathematical formulations remains a bottleneck.<n>We present ComAgent, a multi-LLM agentic AI framework.
arXiv Detail & Related papers (2026-01-27T13:43:59Z)
Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z)
PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration [8.776463501718737]
We propose a context-aware framework that dynamically balances privacy and inference quality.<n>PRISM executes in four stages: (1) the edge device profiles entity-level sensitivity; (2) a soft gating module on the edge selects an execution mode - cloud, edge, or collaboration; (3) for collaborative paths, the edge applies adaptive two-layer local differential privacy based on entity risks; and (4) the cloud LLM generates a semantic sketch from the perturbed prompt.
arXiv Detail & Related papers (2025-11-27T22:32:33Z)
Collaborative Device-Cloud LLM Inference through Reinforcement Learning [17.71514700623717]
Device-cloud collaboration has emerged as a promising paradigm for deploying large language models (LLMs)<n>We propose a framework where the on-device LLM makes routing decisions at the end of its solving process, with this capability instilled through post-training.<n>In particular, we formulate a reward problem with carefully designed rewards that encourage effective problem solving and judicious offloading to the cloud.
arXiv Detail & Related papers (2025-09-28T19:48:56Z)
Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model [43.13807038270687]
CDCDA-PLM is a framework for deploying personalized on-device language models on user devices with support from a powerful cloud-based LLM.<n>Using both real and synthetic data, A personalized on-device language models (LMs) is fine-tuned via parameter-efficient fine-tuning (PEFT) modules.
arXiv Detail & Related papers (2025-08-29T02:33:13Z)
CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering [68.91862701376155]
CoSteer is a novel collaborative framework that enables decoding-time personalization through localized delta steering.<n>We formulate token-level optimization as an online learning problem, where local delta vectors dynamically adjust the remote LLM's logits.<n>This approach preserves privacy by transmitting only the final steered tokens rather than raw data or intermediate vectors.
arXiv Detail & Related papers (2025-07-07T08:32:29Z)
Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization [74.92515821144484]
Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects.<n>Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive.<n>This paper proposes opportunistic collaborative planning (OCP), which seamlessly integrates efficient local models with powerful cloud models.
arXiv Detail & Related papers (2025-04-25T04:07:21Z)
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration [1.6021932740447968]
Large Language Models (LLMs) exhibit remarkable human-like predictive capabilities.<n>It is challenging to deploy LLMs to provide efficient and adaptive inference services at the edge.<n>This paper proposes a novel Cloud-Edge Collaboration framework for LLMs (CE-CoLLM) to tackle these challenges.
arXiv Detail & Related papers (2024-11-05T06:00:27Z)
Cloud-Device Collaborative Learning for Multimodal Large Language Models [24.65882336700547]
We introduce a Cloud-Device Collaborative Continual Adaptation framework to enhance the performance of compressed, device-deployed MLLMs. Our framework is structured into three key components: a device-to-cloud uplink for efficient data transmission, cloud-based knowledge adaptation, and an optimized cloud-to-device downlink for model deployment.
arXiv Detail & Related papers (2023-12-26T18:46:14Z)
Intelligent Model Update Strategy for Sequential Recommendation [34.02565495747133]
We introduce IntellectReq, abbreviated as IntellectReq. IntellectReq is designed to operate on edge, evaluating the cost-benefit landscape of parameter requests with minimal communication overhead.<n>We employ statistical mapping techniques to convert real-time user behavior into a normal distribution, thereby employing multi-sample outputs to quantify the model's uncertainty and thus its generalization capabilities.
arXiv Detail & Related papers (2023-02-14T20:44:12Z)
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization [66.27399823422665]
Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications.<n>We propose an efficient Device-cloUd collaborative parametErs generaTion framework DUET.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
Device-Cloud Collaborative Recommendation via Meta Controller [65.97416287295152]
We propose a meta controller to dynamically manage the collaboration between the on-device recommender and the cloud-based recommender. On the basis of the counterfactual samples and the extended training, extensive experiments in the industrial recommendation scenarios show the promise of meta controller.
arXiv Detail & Related papers (2022-07-07T03:23:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.