Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts
- URL: http://arxiv.org/abs/2509.26093v2
- Date: Wed, 01 Oct 2025 03:38:59 GMT
- Title: Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts
- Authors: Xiaoyan Zhao, Ming Yan, Yang Zhang, Yang Deng, Jian Wang, Fengbin Zhu, Yilun Qiu, Hong Cheng, Tat-Seng Chua,
- Abstract summary: We propose a novel Reinforced Strategy Optimization (RSO) method for Conversational Recommender Systems (CRSs)<n>RSO decomposes the process of generating strategy-driven response decisions into the macro-level strategy planning and micro-level strategy adaptation.<n>Experiments show that RSO significantly improves interaction performance compared to state-of-the-art baselines.
- Score: 63.412646471177645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversational Recommender Systems (CRSs) aim to provide personalized recommendations through multi-turn natural language interactions with users. Given the strong interaction and reasoning skills of Large Language Models (LLMs), leveraging LLMs for CRSs has recently emerged as a promising direction. However, existing LLM-based methods often lack explicit optimization of interaction strategies, instead relying on unified prompts and the LLM's internal knowledge to decide how to interact, which can lead to suboptimal outcomes. In this paper, we propose a novel Reinforced Strategy Optimization (RSO) method for CRS, which decomposes the process of generating strategy-driven response decisions into the macro-level strategy planning and micro-level strategy adaptation through a network-of-experts architecture. At the macro level, a Planner expert selects macro-level interaction strategies (e.g., recommend, explain, encourage). At the micro level, an Actor expert generates detailed responses conditioned on the selected macro-level strategy, guided by auxiliary experts that provide complementary information such as user preferences and factual grounding. This hierarchical decomposition disentangles the optimization of different sub-tasks involved in CRS response generation, enabling more tractable learning at each level. To address the scarcity of high-quality multi-turn training data, we formulate strategy learning as a reinforcement learning problem, guided by an LLM-based reward model to achieve automatic strategy exploration. Extensive experiments show that RSO significantly improves interaction performance compared to state-of-the-art baselines, demonstrating the effectiveness of explicit hierarchical strategy optimization for CRS.
Related papers
- Expanding LLM Agent Boundaries with Strategy-Guided Exploration [51.98616048282804]
Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding.<n>We propose Strategy-Guided Exploration (SGE) to shift exploration from low-level actions to higher-level language strategies.
arXiv Detail & Related papers (2026-03-02T16:28:39Z) - From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory [48.22750809620306]
Large Language Models (LLMs) based agents have demonstrated remarkable potential in autonomous task-solving.<n>In this paper, we introduce a novel agent-centric, trainable, multi-layered graph memory framework.<n>We show how context memory enhances the ability of LLMs to utilize information.
arXiv Detail & Related papers (2025-11-11T03:36:33Z) - Enhance Large Language Models as Recommendation Systems with Collaborative Filtering [9.697791766151958]
This study proposes critique-based Large Language Models (LLMs) as recommendation systems (Critic-LLM-RS)<n>Critic-LLM-RS implements collaborative filtering for recommendations by learning from the interactions between many users and items.<n>Experiments have verified the effectiveness of Critic-LLM-RS on real datasets.
arXiv Detail & Related papers (2025-10-17T13:35:14Z) - Feedback-Induced Performance Decline in LLM-Based Decision-Making [6.5990946334144756]
Large Language Models (LLMs) can extract context from natural language problem descriptions.<n>This paper studies the behaviour of these models within a Markov Decision Process (MDPs)
arXiv Detail & Related papers (2025-07-20T10:38:56Z) - SAGE: Strategy-Adaptive Generation Engine for Query Rewriting [8.941793732446856]
We introduce the Strategy-Adaptive Generation Engine (SAGE), which operationalizes expert-crafted strategies in an reinforcement learning framework.<n>SAGE achieves new state-of-the-art NDCG@10 results, but also uncovers a compelling emergent behavior.<n>Our findings demonstrate that strategy-guided RL, enhanced with nuanced reward shaping, offers a scalable, efficient, and more interpretable paradigm for developing the next generation of robust information retrieval systems.
arXiv Detail & Related papers (2025-06-24T16:50:51Z) - Strategy-Augmented Planning for Large Language Models via Opponent Exploitation [11.840105106884543]
We introduce a two-stage Strategy-Augmented Planning (SAP) framework that significantly enhances the opponent exploitation capabilities of LLM-based agents.<n>In the offline stage, we construct an explicit strategy space and subsequently collect strategy-outcome pair data for training the Strategy Evaluation Network (SEN)<n>During the online phase, SAP dynamically recognizes the opponent's strategies and greedily exploits them by searching best response strategy on the well-trained SEN.
arXiv Detail & Related papers (2025-05-13T11:41:10Z) - A Survey on the Optimization of Large Language Model-based Agents [16.733092886211097]
Large Language Models (LLMs) have been widely adopted in various fields, becoming essential for autonomous decision-making and interactive tasks.<n>However, current work typically relies on prompt design or fine-tuning strategies applied to vanilla LLMs.<n>We provide a comprehensive review of LLM-based agent optimization approaches, categorizing them into parameter-driven and parameter-free methods.
arXiv Detail & Related papers (2025-03-16T10:09:10Z) - EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>We train the strategic reasoning model via multi-turn reinforcement learning (RL),utilizing process rewards and iterative self-play.<n>Our findings reveal various collaborative reasoning mechanisms emergent in EPO and its effectiveness in generating novel strategies.
arXiv Detail & Related papers (2025-02-18T03:15:55Z) - Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment [45.45508377432791]
This paper introduces Reward-Aware Preference Optimization (RPO), a mathematical framework that unifies popular preference optimization techniques.<n>RPO provides a structured approach to disentangle and systematically study the impact of various design choices.<n>We propose a new experimental setup that enables the clean and direct ablation of such design choices.
arXiv Detail & Related papers (2025-01-31T22:39:04Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.
We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models.
It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.