Think before Recommendation: Autonomous Reasoning-enhanced Recommender
- URL: http://arxiv.org/abs/2510.23077v1
- Date: Mon, 27 Oct 2025 07:26:32 GMT
- Title: Think before Recommendation: Autonomous Reasoning-enhanced Recommender
- Authors: Xiaoyu Kong, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Jiancan Wu, Xiang Wang,
- Abstract summary: RecZero is a reinforcement learning-based recommendation paradigm that abandons the traditional multi-model and multi-stage distillation approach.<n>The paper explores a hybrid paradigm, RecOne, which combines supervised fine-tuning with RL, initializing the model with cold-start reasoning samples and further optimizing it with RL.
- Score: 25.883091131835172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The core task of recommender systems is to learn user preferences from historical user-item interactions. With the rapid development of large language models (LLMs), recent research has explored leveraging the reasoning capabilities of LLMs to enhance rating prediction tasks. However, existing distillation-based methods suffer from limitations such as the teacher model's insufficient recommendation capability, costly and static supervision, and superficial transfer of reasoning ability. To address these issues, this paper proposes RecZero, a reinforcement learning (RL)-based recommendation paradigm that abandons the traditional multi-model and multi-stage distillation approach. Instead, RecZero trains a single LLM through pure RL to autonomously develop reasoning capabilities for rating prediction. RecZero consists of two key components: (1) "Think-before-Recommendation" prompt construction, which employs a structured reasoning template to guide the model in step-wise analysis of user interests, item features, and user-item compatibility; and (2) rule-based reward modeling, which adopts group relative policy optimization (GRPO) to compute rewards for reasoning trajectories and optimize the LLM. Additionally, the paper explores a hybrid paradigm, RecOne, which combines supervised fine-tuning with RL, initializing the model with cold-start reasoning samples and further optimizing it with RL. Experimental results demonstrate that RecZero and RecOne significantly outperform existing baseline methods on multiple benchmark datasets, validating the superiority of the RL paradigm in achieving autonomous reasoning-enhanced recommender systems.
Related papers
- Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation [44.51582748617213]
Reasoning to Rank is an end-to-end training framework that internalizes recommendation utility optimization into the learning of step-by-step reasoning in language models.<n>Our framework performs reasoning at the user-item level and employs reinforcement learning for end-to-end training of the language model.
arXiv Detail & Related papers (2026-02-13T02:22:48Z) - OneRec-Think: In-Text Reasoning for Generative Recommendation [55.53292983432484]
OneRec-Think is a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation.<n>Our proposed "Think-Ahead" architecture enables effective industrial deployment on Kuaishou, achieving a 0.159% gain in APP Stay Time.
arXiv Detail & Related papers (2025-10-13T17:20:13Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - Towards Comprehensible Recommendation with Large Language Model Fine-tuning [41.218487308635126]
We propose a novel Content Understanding from a Collaborative Perspective framework (CURec) for recommendation systems.<n>Curec generates collaborative-aligned content features for more comprehensive recommendations.<n>Experiments on public benchmarks demonstrate the superiority of CURec over existing methods.
arXiv Detail & Related papers (2025-08-11T03:55:31Z) - RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 [20.92548890511589]
This paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs)<n> RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty.
arXiv Detail & Related papers (2025-06-24T01:39:34Z) - R$^2$ec: Towards Large Recommender Models with Reasoning [59.32598867813266]
We propose R$2$ec, a unified large recommender model with intrinsic reasoning capability.<n>R$2$ec introduces a dual-head architecture that supports both reasoning chain generation and efficient item prediction in a single model.<n>To overcome the lack of annotated reasoning data, we design RecPO, a reinforcement learning framework.
arXiv Detail & Related papers (2025-05-22T17:55:43Z) - LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z) - Think Only When You Need with Large Hybrid-Reasoning Models [121.55211364358662]
Large Hybrid-Reasoning Models (LHRMs)<n>First kind of model capable of adaptively determining whether to perform thinking based on contextual information of user queries.<n>Extensive experimental results show that LHRMs can adaptively perform hybrid thinking on queries of varying difficulty and type.
arXiv Detail & Related papers (2025-05-20T17:23:25Z) - LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation [15.972926854420619]
Leveraging large language models (LLMs) offers new opportunities for comprehensive recommendation logic generation.
Fine-tuning LLM models for recommendation tasks incurs high computational costs and alignment issues with existing systems.
In this work, our proposed effective strategy LANE aligns LLMs with online recommendation systems without additional LLMs tuning.
arXiv Detail & Related papers (2024-07-03T06:20:31Z) - Step-level Value Preference Optimization for Mathematical Reasoning [6.318873143509028]
We introduce a novel algorithm called Step-level Value Preference Optimization (SVPO)
Our method achieves state-of-the-art performance on both in-domain and out-of-domain mathematical reasoning benchmarks.
arXiv Detail & Related papers (2024-06-16T09:06:17Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.