Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
- URL: http://arxiv.org/abs/2512.12775v1
- Date: Sun, 14 Dec 2025 17:27:02 GMT
- Title: Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
- Authors: Pedro Henrique Luz de Araujo, Michael A. Hedderich, Ali Modarressi, Hinrich Schuetze, Benjamin Roth,
- Abstract summary: Persona-assigned large language models (LLMs) are used in domains such as education, healthcare, and sociodemographic simulation.<n>We introduce an evaluation protocol that combines long persona dialogues and evaluation datasets to create dialogue-conditioned benchmarks.<n>We find that persona fidelity degrades over the course of dialogues, especially in goal-oriented conversations.
- Score: 11.415343473837583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Persona-assigned large language models (LLMs) are used in domains such as education, healthcare, and sociodemographic simulation. Yet, they are typically evaluated only in short, single-round settings that do not reflect real-world usage. We introduce an evaluation protocol that combines long persona dialogues (over 100 rounds) and evaluation datasets to create dialogue-conditioned benchmarks that can robustly measure long-context effects. We then investigate the effects of dialogue length on persona fidelity, instruction-following, and safety of seven state-of-the-art open- and closed-weight LLMs. We find that persona fidelity degrades over the course of dialogues, especially in goal-oriented conversations, where models must sustain both persona fidelity and instruction following. We identify a trade-off between fidelity and instruction following, with non-persona baselines initially outperforming persona-assigned models; as dialogues progress and fidelity fades, persona responses become increasingly similar to baseline responses. Our findings highlight the fragility of persona applications in extended interactions and our work provides a protocol to systematically measure such failures.
Related papers
- Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction [55.24448139349266]
We present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions.<n>To improve personalized service-oriented interactions, we propose H$2$Memory, a hierarchical and heterogeneous memory framework.
arXiv Detail & Related papers (2025-11-17T14:22:32Z) - Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning [52.07170679746533]
Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play.<n>We introduce a unified framework for evaluating and improving persona consistency in LLM-generated dialogue.<n>We define three automatic metrics: prompt-to-line consistency, line-to-line consistency, and Q&A consistency, that capture different types of persona drift and validate each against human annotations.
arXiv Detail & Related papers (2025-10-31T19:40:41Z) - Score Before You Speak: Improving Persona Consistency in Dialogue Generation using Response Quality Scores [2.150144047598779]
Persona-based dialogue generation is an important milestone towards building conversational artificial intelligence.<n>We propose a novel framework SBS (Score-Before-Speaking), which outperforms previous methods.<n>We show that score-conditioned training allows existing models to better capture a spectrum of persona-consistent dialogues.
arXiv Detail & Related papers (2025-08-09T08:30:06Z) - Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation [16.76995815742803]
We propose an atomic-level evaluation framework that quantifies persona fidelity at a finer granularity.<n>Our three key metrics measure the degree of persona alignment and consistency within and across generations.<n>By analyzing persona fidelity across diverse tasks and personality types, we reveal how task structure and persona desirability influence model adaptability.
arXiv Detail & Related papers (2025-06-24T06:33:10Z) - A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z) - REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation [51.97224538045096]
We introduce REALTALK, a 21-day corpus of authentic messaging app dialogues.<n>We compare EI attributes and persona consistency to understand the challenges posed by real-world dialogues.<n>Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation.
arXiv Detail & Related papers (2025-02-18T20:29:01Z) - Dialogue Language Model with Large-Scale Persona Data Engineering [10.160626284195434]
PPDS is an open-domain persona dialogue system that employs extensive generative pre-training on a persona dialogue dataset to enhance persona consistency.<n>We present a persona extraction model designed to autonomously and precisely generate vast persona dialogue datasets.<n>We also unveil a pioneering persona augmentation technique to address the invalid persona bias inherent in the constructed dataset.
arXiv Detail & Related papers (2024-12-12T07:49:06Z) - Dialogue Evaluation with Offline Reinforcement Learning [2.580163308334609]
Task-oriented dialogue systems aim to fulfill user goals through natural language interactions.
They are ideally evaluated with human users, which is unattainable to do at every iteration of the development phase.
We propose the use of offline reinforcement learning for dialogue evaluation based on a static corpus.
arXiv Detail & Related papers (2022-09-02T08:32:52Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.