Related papers: R$^2$ec: Towards Large Recommender Models with Reasoning

R$^2$ec: Towards Large Recommender Models with Reasoning

URL: http://arxiv.org/abs/2505.16994v2
Date: Wed, 15 Oct 2025 05:22:45 GMT
Title: R$^2$ec: Towards Large Recommender Models with Reasoning
Authors: Runyang You, Yongqi Li, Xinyu Lin, Xin Zhang, Wenjie Wang, Wenjie Li, Liqiang Nie,
Abstract summary: We propose R$2$ec, a unified large recommender model with intrinsic reasoning capability.<n>R$2$ec introduces a dual-head architecture that supports both reasoning chain generation and efficient item prediction in a single model.<n>To overcome the lack of annotated reasoning data, we design RecPO, a reinforcement learning framework.
Score: 59.32598867813266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large recommender models have extended LLMs as powerful recommenders via encoding or item generation, and recent breakthroughs in LLM reasoning synchronously motivate the exploration of reasoning in recommendation. In this work, we propose R$^2$ec, a unified large recommender model with intrinsic reasoning capability. R$^2$ec introduces a dual-head architecture that supports both reasoning chain generation and efficient item prediction in a single model, significantly reducing inference latency. To overcome the lack of annotated reasoning data, we design RecPO, a reinforcement learning framework that optimizes reasoning and recommendation jointly with a novel fused reward mechanism. Extensive experiments on three datasets demonstrate that R$^2$ec outperforms traditional, LLM-based, and reasoning-augmented recommender baselines, while further analyses validate its competitive efficiency among conventional LLM-based recommender baselines and strong adaptability to diverse recommendation scenarios. Code and checkpoints available at https://github.com/YRYangang/RRec.

Related papers

Towards Sample-Efficient and Stable Reinforcement Learning for LLM-based Recommendation [56.92367609590823]
Long Chain-of-Thought (Long CoT) reasoning has shown promise in Large Language Models (LLMs)<n>We argue that Long CoT is inherently ill-suited for the sequential recommendation domain.<n>We propose RISER, a novel Reinforced Item Space Exploration framework for Recommendation.
arXiv Detail & Related papers (2026-01-31T10:02:43Z)
Think before Recommendation: Autonomous Reasoning-enhanced Recommender [25.883091131835172]
RecZero is a reinforcement learning-based recommendation paradigm that abandons the traditional multi-model and multi-stage distillation approach.<n>The paper explores a hybrid paradigm, RecOne, which combines supervised fine-tuning with RL, initializing the model with cold-start reasoning samples and further optimizing it with RL.
arXiv Detail & Related papers (2025-10-27T07:26:32Z)
OneRec-Think: In-Text Reasoning for Generative Recommendation [55.53292983432484]
OneRec-Think is a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation.<n>Our proposed "Think-Ahead" architecture enables effective industrial deployment on Kuaishou, achieving a 0.159% gain in APP Stay Time.
arXiv Detail & Related papers (2025-10-13T17:20:13Z)
Towards Comprehensible Recommendation with Large Language Model Fine-tuning [41.218487308635126]
We propose a novel Content Understanding from a Collaborative Perspective framework (CURec) for recommendation systems.<n>Curec generates collaborative-aligned content features for more comprehensive recommendations.<n>Experiments on public benchmarks demonstrate the superiority of CURec over existing methods.
arXiv Detail & Related papers (2025-08-11T03:55:31Z)
Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation [9.282278040339138]
$textbfR2Rec$ is a reasoning-enhanced recommendation framework.<n>It samples interaction chains from the user-item graph and converts them into structured interaction-of-thoughts.
arXiv Detail & Related papers (2025-06-05T14:16:44Z)
Reinforced Latent Reasoning for LLM-based Recommendation [83.18146814163308]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks.<n>Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data.<n>In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning.
arXiv Detail & Related papers (2025-05-25T11:03:45Z)
LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>We show that LARES exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z)
DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation [83.21140655248624]
Large language models (LLMs) have been introduced into recommender systems (RSs)<n>We propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space.<n> Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines.
arXiv Detail & Related papers (2025-05-22T15:49:38Z)
Decoding Recommendation Behaviors of In-Context Learning LLMs Through Gradient Descent [15.425423867768163]
We propose a theoretical model, the LLM-ICL Recommendation Equivalent Gradient Descent model (LRGD) in this paper.<n>We demonstrate that the ICL inference process in LLM aligns with the training procedure of its dual model, producing token predictions equivalent to the dual model's testing outputs.<n>To further improve demonstration effectiveness, prevent performance collapse, and ensure long-term adaptability, we also propose a two-stage optimization process in practice.
arXiv Detail & Related papers (2025-04-06T06:36:45Z)
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation [20.965068290049057]
We propose textbfReaRec, the first inference-time computing framework for recommender systems.<n>ReaRec autoregressively feeds the sequence's last hidden state into the sequential recommender.<n>We introduce two lightweight reasoning-based learning methods, Ensemble Reasoning Learning (ERL) and Progressive Reasoning Learning (PRL)
arXiv Detail & Related papers (2025-03-28T17:59:03Z)
Towards Scalable Semantic Representation for Recommendation [65.06144407288127]
Mixture-of-Codes is proposed to construct semantic IDs based on large language models (LLMs) Our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations.
arXiv Detail & Related papers (2024-10-12T15:10:56Z)
LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation [15.972926854420619]
Leveraging large language models (LLMs) offers new opportunities for comprehensive recommendation logic generation. Fine-tuning LLM models for recommendation tasks incurs high computational costs and alignment issues with existing systems. In this work, our proposed effective strategy LANE aligns LLMs with online recommendation systems without additional LLMs tuning.
arXiv Detail & Related papers (2024-07-03T06:20:31Z)
Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z)
Can Small Language Models be Good Reasoners for Sequential Recommendation? [34.098264212413305]
Step-by-step knowLedge dIstillation fraMework for recommendation (SLIM) We introduce CoT prompting based on user behavior sequences for the larger teacher model. The rationales generated by the teacher model are then utilized as labels to distill the downstream smaller student model.
arXiv Detail & Related papers (2024-03-07T06:49:37Z)
Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z)
Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.