Related papers: SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation

SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation

URL: http://arxiv.org/abs/2601.22543v1
Date: Fri, 30 Jan 2026 04:28:50 GMT
Title: SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation
Authors: Ruiqi Zheng, Jinli Cao, Jiao Yin, Hongzhi Yin,
Abstract summary: We introduce the Semantic for LLM-enabled Cloud-Device Recommendation (SCaLRec)<n>SCaLRec estimates the reliability of cached semantics under the user's latest interactions.<n>Experiments on real-world datasets show that SCaLRec consistently improves recommendation performance over strong baselines under cloud semantic staleness.
Score: 28.131961539123754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cloud-device collaborative recommendation partitions computation across the cloud and user devices: the cloud provides semantic user modeling, while the device leverages recent interactions and cloud semantic signals for privacy-preserving, responsive reranking. With large language models (LLMs) on the cloud, semantic user representations can improve sequential recommendation by capturing high-level intent. However, regenerating such representations via cloud LLM inference for every request is often infeasible at real-world scale. As a result, on-device reranking commonly reuses a cached cloud semantic user embedding across requests. We empirically identify a cloud semantic staleness effect: reused embeddings become less aligned with the user's latest interactions, leading to measurable ranking degradation. Most existing LLM-enabled cloud-device recommenders are typically designed around on-demand cloud semantics, either by assuming low-latency cloud LLM access or by regenerating semantic embeddings per request. When per-request regeneration is infeasible and cached semantics must be reused, two technical challenges arise: (1) deciding when cached cloud semantics remain useful for on-device reranking, and (2) maintaining ranking quality when the cloud LLM cannot be invoked and only cached semantics are available. To address this gap, we introduce the Semantic Calibration for LLM-enabled Cloud-Device Recommendation (SCaLRec). First, it estimates the reliability of cached semantics under the user's latest interactions. Second, an on-device semantic calibration module is proposed to adjusts the cached semantic embedding on-device using up-to-date interaction evidence, without per-request cloud LLM involvement. Experiments on real-world datasets show that SCaLRec consistently improves recommendation performance over strong baselines under cloud semantic staleness.

Related papers

Cloud-Device Collaborative Agents for Sequential Recommendation [36.05863003744828]
Large language models (LLMs) have enabled agent-based recommendation systems with strong semantic understanding and flexible reasoning capabilities.<n>LLMs offer powerful personalization, but they often suffer from privacy concerns, limited access to real-time signals, and scalability bottlenecks.<n>We propose a novel Cloud-Device collaborative framework for sequential Recommendation, powered by dual agents.
arXiv Detail & Related papers (2025-09-01T15:28:11Z)
EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models [57.059991285047296]
hybrid edge-cloud collaborative framework was recently proposed to realize fast inference and high-quality generation.<n>Excessive cloud denoising prolongs inference time, while insufficient steps cause semantic ambiguity, leading to inconsistency in edge model output.<n>We propose EC-Diff that accelerates cloud inference through gradient-based noise estimation.<n>Our method significantly enhances generation quality compared to edge inference, while achieving up to an average $2times$ speedup in inference compared to cloud inference.
arXiv Detail & Related papers (2025-07-16T07:23:14Z)
LSRP: A Leader-Subordinate Retrieval Framework for Privacy-Preserving Cloud-Device Collaboration [43.115594451678255]
Cloud-device collaboration leverages on-cloud Large Language Models (LLMs) for handling public user queries and on-device Small Language Models (SLMs) for processing private user data.<n>Existing approaches often fail to fully leverage the scalable problem-solving capabilities of on-cloud LLMs.<n>We propose a Leader-Subordinate Retrieval framework for Privacy-preserving cloud-device collaboration (LSRP)
arXiv Detail & Related papers (2025-05-08T08:06:34Z)
A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models [12.644230479753476]
Traditional cloud-based large language models (LLMs) meet high-accuracy requirements, but fall short of critical demands for low delay and enhanced privacy.<n>We propose HAT, a novel device-cloud collaborative inference framework that leverages the complementary strengths of U-shaped inference and speculative decoding.<n>We show that HAT achieves promising performance improvements, reducing TTFT by 41% to 54% and TBT by 41% to 77% compared to the baselines.
arXiv Detail & Related papers (2025-03-23T10:54:58Z)
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration [1.6021932740447968]
Large Language Models (LLMs) exhibit remarkable human-like predictive capabilities.<n>It is challenging to deploy LLMs to provide efficient and adaptive inference services at the edge.<n>This paper proposes a novel Cloud-Edge Collaboration framework for LLMs (CE-CoLLM) to tackle these challenges.
arXiv Detail & Related papers (2024-11-05T06:00:27Z)
IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images [55.40601468843028]
We present an iterative diffusion process for cloud removal (IDF-CR) IDF-CR is divided into two-stage models that address pixel space and latent space. In the latent space stage, the diffusion model transforms low-quality cloud removal into high-quality clean output.
arXiv Detail & Related papers (2024-03-18T15:23:48Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation [9.320732264679238]
We present CloudEval-YAML, a practical benchmark for cloud configuration generation. The dataset consists of hand-written problems with unit tests targeting practical scenarios. The dataset consists of 1011 problems that take more than 1200 human hours to complete.
arXiv Detail & Related papers (2023-11-10T01:49:57Z)
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases. We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets. Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)
Intelligent Model Update Strategy for Sequential Recommendation [34.02565495747133]
We introduce IntellectReq, abbreviated as IntellectReq. IntellectReq is designed to operate on edge, evaluating the cost-benefit landscape of parameter requests with minimal communication overhead.<n>We employ statistical mapping techniques to convert real-time user behavior into a normal distribution, thereby employing multi-sample outputs to quantify the model's uncertainty and thus its generalization capabilities.
arXiv Detail & Related papers (2023-02-14T20:44:12Z)
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization [66.27399823422665]
Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications.<n>We propose an efficient Device-cloUd collaborative parametErs generaTion framework DUET.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
Device-Cloud Collaborative Recommendation via Meta Controller [65.97416287295152]
We propose a meta controller to dynamically manage the collaboration between the on-device recommender and the cloud-based recommender. On the basis of the counterfactual samples and the extended training, extensive experiments in the industrial recommendation scenarios show the promise of meta controller.
arXiv Detail & Related papers (2022-07-07T03:23:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.