Personality-Aware Reinforcement Learning for Persuasive Dialogue with LLM-Driven Simulation
- URL: http://arxiv.org/abs/2601.06877v1
- Date: Sun, 11 Jan 2026 11:53:07 GMT
- Title: Personality-Aware Reinforcement Learning for Persuasive Dialogue with LLM-Driven Simulation
- Authors: Donghuo Zeng, Roberto Legaspi, Kazushi Ikeda,
- Abstract summary: We present a personality-aware reinforcement learning approach comprising three main modules.<n>We use an agenda-based simulation pipeline to generate diverse interactions.<n>Experiments on the PersuasionForGood dataset augmented with simulated dialogues reveal three main findings.
- Score: 5.97941583499908
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective persuasive dialogue agents adapt their strategies to individual users, accounting for the evolution of their psychological states and intentions throughout conversations. We present a personality-aware reinforcement learning approach comprising three main modules: (1) a Strategy-Oriented Interaction Framework, which serves as an agenda-based strategy controller that selects strategy-level actions and generate responses via Maximal Marginal Relevance (MMR) retrieval to ensure contextual relevance, diversity, and scalable data generation; (2) Personality-Aware User Representation Learning, which produces an 81-dimensional mixed-type embedding predicted at each turn from recent exchanges and appended to the reinforcement learning state; and (3) a Dueling Double DQN (D3QN) model and Reward Prediction, in which the policy is conditioned on dialogue history and turn-level personality estimates and trained using a composite reward incorporating agreement intent, donation amount, and changeof-mind penalties. We use an agenda-based LLM simulation pipeline to generate diverse interactions, from which personality estimation is inferred from the generated utterances. Experiments on the PersuasionForGood (P4G) dataset augmented with simulated dialogues reveal three main findings: (i) turn-level personality conditioning improves policy adaptability and cumulative persuasion rewards; (ii) LLM-driven simulation enhances generalization to unseen user behaviors; and (iii) incorporating a change-of-mind penalty reduces post-agreement retractions while slightly improving donation outcomes. These results demonstrate that structured interaction, dynamic personality estimation, and behaviorally informed rewards together yield more effective persuasive policies.
Related papers
- Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning [66.52010873968383]
We introduce a conversational agent that interleaves search and reasoning across turns, enabling exploratory and adaptive behaviors learned through reinforcement learning (RL) training.<n>The experimental results across four widely used conversational benchmarks demonstrate the effectiveness of our methods.
arXiv Detail & Related papers (2026-01-19T14:55:54Z) - Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning [52.07170679746533]
Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play.<n>We introduce a unified framework for evaluating and improving persona consistency in LLM-generated dialogue.<n>We define three automatic metrics: prompt-to-line consistency, line-to-line consistency, and Q&A consistency, that capture different types of persona drift and validate each against human annotations.
arXiv Detail & Related papers (2025-10-31T19:40:41Z) - MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion [73.99171322670772]
Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news.<n> MMPersuade provides a unified framework for systematically studying multimodal persuasion dynamics in LVLMs.
arXiv Detail & Related papers (2025-10-26T17:39:21Z) - AgentRec: Next-Generation LLM-Powered Multi-Agent Collaborative Recommendation with Adaptive Intelligence [4.638507244153875]
This paper introduces AgentRec, a next-generation multi-agent collaborative recommendation framework.<n>Our approach employs specialized LLM-powered agents for conversation understanding, preference modeling, context awareness, and dynamic ranking.<n>Experiments on three real-world datasets demonstrate that AgentRec achieves consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2025-10-02T02:47:11Z) - UserRL: Training Interactive User-Centric Agent via Reinforcement Learning [104.63494870852894]
Reinforcement learning (RL) has shown promise in training agentic models that engage in dynamic, multi-turn interactions.<n>We propose UserRL, a unified framework for training and evaluating user-centric abilities through standardized gym environments.
arXiv Detail & Related papers (2025-09-24T03:33:20Z) - Exploring the Impact of Personality Traits on Conversational Recommender Systems: A Simulation with Large Language Models [70.180385882195]
This paper introduces a personality-aware user simulation for Conversational Recommender Systems (CRSs)<n>The user agent induces customizable personality traits and preferences, while the system agent possesses the persuasion capability to simulate realistic interaction in CRSs.<n> Experimental results demonstrate that state-of-the-art LLMs can effectively generate diverse user responses aligned with specified personality traits.
arXiv Detail & Related papers (2025-04-09T13:21:17Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome [13.731895847081953]
We present a novel approach that tracks a user's latent personality dimensions (LPDs) during ongoing persuasion conversation.
We generate tailored counterfactual utterances based on these LPDs to optimize the overall persuasion outcome.
arXiv Detail & Related papers (2024-04-21T23:03:47Z) - WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation
for Multi-turn Dialogue [17.663449579168297]
We simulate a dialogue between an agent and a user (modelled similar to an agent with supervised learning objective) to interact with each other.
The agent uses dynamic blocking to generate ranked diverse responses and exploration-exploitation to select among the Top-K responses.
Empirical studies with two benchmarks indicate that our model can significantly out-perform the response quality and lead to a successful conversation.
arXiv Detail & Related papers (2021-08-01T08:00:45Z) - Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features.
To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives.
Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.