STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning
- URL: http://arxiv.org/abs/2508.18812v1
- Date: Tue, 26 Aug 2025 08:47:58 GMT
- Title: STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning
- Authors: Chenghao Wu, Ruiyang Ren, Junjie Zhang, Ruirui Wang, Zhongrui Ma, Qi Ye, Wayne Xin Zhao,
- Abstract summary: We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
- Score: 54.28691219536054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While modern recommender systems are instrumental in navigating information abundance, they remain fundamentally limited by static user modeling and reactive decision-making paradigms. Current large language model (LLM)-based agents inherit these shortcomings through their overreliance on heuristic pattern matching, yielding recommendations prone to shallow correlation bias, limited causal inference, and brittleness in sparse-data scenarios. We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities. Each user is modeled as an agent with parallel cognitions: fast response for immediate interactions and slow reasoning that performs chain-of-thought rationales. To cultivate intrinsic slow thinking, we develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping. This hybrid approach scaffolds agents in acquiring foundational capabilities (preference summarization, rationale generation) while enabling dynamic policy adaptation through simulated feedback loops. Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines, despite using only 0.4% of the full training data.
Related papers
- ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters [6.13905106667213]
ATLAS is a task-distributed framework that iteratively develops a lightweight research agent.<n>Our core algorithm, Evolving Direct Preference Optimization (EvoDPO), adaptively updates the phase-indexed reference policy.<n>Results show that ATLAS improves stability and performance over a static single-agent baseline.
arXiv Detail & Related papers (2026-02-02T19:23:33Z) - RecNet: Self-Evolving Preference Propagation for Agentic Recommender Systems [109.9061591263748]
RecNet is a self-evolving preference propagation framework for recommender systems.<n>It proactively propagates real-time preference updates across related users and items.<n>In the backward phase, the feedback-driven propagation optimization mechanism simulates a multi-agent reinforcement learning framework.
arXiv Detail & Related papers (2026-01-29T12:14:31Z) - Think before Recommendation: Autonomous Reasoning-enhanced Recommender [25.883091131835172]
RecZero is a reinforcement learning-based recommendation paradigm that abandons the traditional multi-model and multi-stage distillation approach.<n>The paper explores a hybrid paradigm, RecOne, which combines supervised fine-tuning with RL, initializing the model with cold-start reasoning samples and further optimizing it with RL.
arXiv Detail & Related papers (2025-10-27T07:26:32Z) - Next Interest Flow: A Generative Pre-training Paradigm for Recommender Systems by Modeling All-domain Movelines [8.895768051554162]
We propose a novel generative pre-training paradigm for e-commerce recommender systems.<n>Our model learns to predict the Next Interest Flow, a dense vector sequence representing a user's future intent.<n>We present the All-domain Moveline Evolution Network (AMEN), a unified framework implementing our entire pipeline.
arXiv Detail & Related papers (2025-10-13T12:13:17Z) - From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System [11.373145953200137]
We introduce a multi-stage framework designed for progressive alignment between the generation policy and user intent.<n>Our framework significantly outperforms baselines on both automatic and human evaluations.
arXiv Detail & Related papers (2025-08-15T10:17:01Z) - RAAG: Ratio Aware Adaptive Guidance [7.2455669888408085]
We show that the earliest reverse steps are acutely sensitive to the guidance scale, owing to a pronounced spike in the relative strength (RATIO) of conditional to unconditional predictions.<n>We propose a simple, theoretically grounded, RATIO-aware adaptive guidance schedule that automatically dampens the guidance scale at early steps based on the evolving RATIO.<n>Our approach enables up to 3x faster sampling while maintaining or improving generation quality, robustness, and semantic alignment.
arXiv Detail & Related papers (2025-08-05T13:41:05Z) - What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context [56.590259941275434]
RecPO is a preference optimization framework for sequential recommendation.<n>It exploits adaptive reward margins based on inferred preference hierarchies and temporal signals.<n>It mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.
arXiv Detail & Related papers (2025-06-02T21:09:29Z) - LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z) - Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z) - Data-Scarce Identification of Game Dynamics via Sum-of-Squares Optimization [29.568222003322344]
We introduce the Side-Information Assisted Regression (SIAR) framework, designed to identify game dynamics in multiplayer normal-form games.
SIAR is solved using sum-of-squares (SOS) optimization, resulting in a hierarchy of approximations that provably converge to the true dynamics of the system.
We showcase that the SIAR framework accurately predicts player behavior across a spectrum of normal-form games, widely-known families of game dynamics, and strong benchmarks, even if the unknown system is chaotic.
arXiv Detail & Related papers (2023-07-13T09:14:48Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.