SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2410.09580v1
- Date: Sat, 12 Oct 2024 16:21:33 GMT
- Title: SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search
- Authors: Hanwen Du, Bo Peng, Xia Ning,
- Abstract summary: Existing methods train Reinforcement Learning-based agent with greedy action selection or sampling strategy.
We present a novel Monte Carlo Tree Search (MCTS)-based CRS framework SAPIENT.
SAPIENT consists of a conversational agent (S-agent) and a conversational planner (S-planner)
- Score: 5.079888940901933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conversational Recommender Systems (CRS) proactively engage users in interactive dialogues to elicit user preferences and provide personalized recommendations. Existing methods train Reinforcement Learning (RL)-based agent with greedy action selection or sampling strategy, and may suffer from suboptimal conversational planning. To address this, we present a novel Monte Carlo Tree Search (MCTS)-based CRS framework SAPIENT. SAPIENT consists of a conversational agent (S-agent) and a conversational planner (S-planner). S-planner builds a conversational search tree with MCTS based on the initial actions proposed by S-agent to find conversation plans. The best conversation plans from S-planner are used to guide the training of S-agent, creating a self-training loop where S-agent can iteratively improve its capability for conversational planning. Furthermore, we propose an efficient variant SAPIENT-e for trade-off between training efficiency and performance. Extensive experiments on four benchmark datasets validate the effectiveness of our approach, showing that SAPIENT outperforms the state-of-the-art baselines.
Related papers
- Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models [52.7201882529976]
Planning-based Conversational Agents (PCA) is a dialogue framework aimed at enhancing controllability of LLM-driven agents.
We propose a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o.
We also propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes Monte Carlo Tree Search for optimal action planning during dialogues.
arXiv Detail & Related papers (2024-07-04T12:23:02Z) - Identifying Breakdowns in Conversational Recommender Systems using User Simulation [15.54070473873364]
We present a methodology to test conversational recommender systems with regards to conversational breakdowns.
It involves examining conversations generated between the system and simulated users for a set of pre-defined breakdown types.
We apply our methodology in a case study with an existing conversational recommender system and user simulator, demonstrating that with just a few iterations, we can make the system more robust to conversational breakdowns.
arXiv Detail & Related papers (2024-05-23T07:28:26Z) - SSP: Self-Supervised Post-training for Conversational Search [63.28684982954115]
We propose fullmodel (model) which is a new post-training paradigm with three self-supervised tasks to efficiently initialize the conversational search model.
To verify the effectiveness of our proposed method, we apply the conversational encoder post-trained by model on the conversational search task using two benchmark datasets: CAsT-19 and CAsT-20.
arXiv Detail & Related papers (2023-07-02T13:36:36Z) - Improving Conversational Recommendation Systems via Counterfactual Data
Simulation [73.4526400381668]
Conversational recommender systems (CRSs) aim to provide recommendation services via natural language conversations.
Existing CRS approaches often suffer from the issue of insufficient training due to the scarcity of training data.
We propose a CounterFactual data simulation approach for CRS, named CFCRS, to alleviate the issue of data scarcity in CRSs.
arXiv Detail & Related papers (2023-06-05T12:48:56Z) - Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy
Planning [22.753613264491918]
GDP-Zero is an approach using Open-Loop MCTS to perform goal-oriented dialogue policy planning without any model training.
We evaluate GDP-Zero on the goal-oriented task PersuasionForGood, and find that its responses are preferred over ChatGPT up to 59.32% of the time.
arXiv Detail & Related papers (2023-05-23T04:07:03Z) - Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs)
In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol.
We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z) - KILDST: Effective Knowledge-Integrated Learning for Dialogue State
Tracking using Gazetteer and Speaker Information [3.342637296393915]
Dialogue State Tracking (DST) is core research in dialogue systems and has received much attention.
It is necessary to define a new problem that can deal with dialogue between users as a step toward the conversational AI that extracts and recommends information from the dialogue between users.
We introduce a new task - DST from dialogue between users about scheduling an event (DST-S)
The DST-S task is much more challenging since it requires the model to understand and track dialogue in the dialogue between users and to understand who suggested the schedule and who agreed to the proposed schedule.
arXiv Detail & Related papers (2023-01-18T07:11:56Z) - Follow Me: Conversation Planning for Target-driven Recommendation
Dialogue Systems [9.99763097964222]
Recommendation dialogue systems aim to build social bonds with users and provide high-quality recommendations.
This paper pushes forward towards a promising paradigm called target-driven recommendation dialogue systems.
We focus on how to naturally lead users to accept the designated targets gradually through conversations.
arXiv Detail & Related papers (2022-08-06T13:23:42Z) - CR-Walker: Tree-Structured Graph Reasoning and Dialog Acts for
Conversational Recommendation [62.13413129518165]
CR-Walker is a model that performs tree-structured reasoning on a knowledge graph.
It generates informative dialog acts to guide language generation.
Automatic and human evaluations show that CR-Walker can arrive at more accurate recommendation.
arXiv Detail & Related papers (2020-10-20T14:53:22Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.