Related papers: TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

URL: http://arxiv.org/abs/2405.18027v1
Date: Tue, 28 May 2024 10:19:18 GMT
Title: TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models
Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim,
Abstract summary: We introduce TimeChara, a new benchmark designed to evaluate point-in-time character hallucination in role-playing LLMs. We propose Narrative-Experts, a method that decomposes the reasoning steps and utilizes narrative experts to reduce point-in-time character hallucinations effectively.
Score: 55.51648393234699
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurately represent characters at specific time points, agents must avoid character hallucination, where they display knowledge that contradicts their characters' identities and historical timelines. We introduce TimeChara, a new benchmark designed to evaluate point-in-time character hallucination in role-playing LLMs. Comprising 10,895 instances generated through an automated pipeline, this benchmark reveals significant hallucination issues in current state-of-the-art LLMs (e.g., GPT-4o). To counter this challenge, we propose Narrative-Experts, a method that decomposes the reasoning steps and utilizes narrative experts to reduce point-in-time character hallucinations effectively. Still, our findings with TimeChara highlight the ongoing challenges of point-in-time character hallucination, calling for further study.

Related papers

Collaborative Storytelling and LLM: A Linguistic Analysis of Automatically-Generated Role-Playing Game Sessions [55.2480439325792]
Role-playing games (RPG) are games in which players interact with one another to create narratives. This emerging form of shared narrative, primarily oral, is receiving increasing attention. In this paper, we aim to discover to what extent the language of Large Language Models (LLMs) exhibit oral or written features when asked to generate an RPG session.
arXiv Detail & Related papers (2025-03-26T15:10:47Z)
Towards Enhanced Immersion and Agency for LLM-based Interactive Drama [55.770617779283064]
This paper begins with understanding interactive drama from two aspects: Immersion, the player's feeling of being present in the story, and Agency. To enhance these two aspects, we first propose Playwriting-guided Generation, a novel method that helps LLMs craft dramatic stories with substantially improved structures and narrative quality.
arXiv Detail & Related papers (2025-02-25T06:06:16Z)
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds [74.02480671181685]
Role-playing is a crucial capability of Large Language Models (LLMs) Current evaluation methods fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing. We propose CharacterBox, a simulation sandbox designed to generate situational fine-grained character behavior trajectories.
arXiv Detail & Related papers (2024-12-07T12:09:35Z)
Mitigating Object Hallucination via Concentric Causal Attention [71.27325347912823]
We show that object hallucination is closely tied with Rotary Position. RoPE, a widely adopted positional dependency modeling design. We propose Concentric Causal Attention (CCA), a simple yet effective positional alignment strategy.
arXiv Detail & Related papers (2024-10-21T11:54:53Z)
RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems [20.786294377706717]
Role-playing systems powered by large language models (LLMs) have become increasingly influential in emotional communication applications. These systems are susceptible to character hallucinations, where the model deviates from predefined character roles and generates responses that are inconsistent with the intended persona. This paper presents the first systematic analysis of character hallucination from an attack perspective, introducing the RoleBreak framework.
arXiv Detail & Related papers (2024-09-25T08:23:46Z)
Generating Visual Stories with Grounded and Coreferent Characters [63.07511918366848]
We present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. We also propose new evaluation metrics to measure the richness of characters and coreference in stories.
arXiv Detail & Related papers (2024-09-20T14:56:33Z)
Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data [58.92110996840019]
We propose to enhance role-playing language models (RPLMs) via personality-indicative data. Specifically, we leverage questions from psychological scales and distill advanced RPAs to generate dialogues that grasp the minds of characters. Experimental results validate that RPLMs trained with our dataset exhibit advanced role-playing capabilities for both general and personality-related evaluations.
arXiv Detail & Related papers (2024-06-27T06:24:00Z)
Mitigating Hallucination in Fictional Character Role-Play [19.705708068900076]
We focus on the evaluation and mitigation of hallucination in fictional character role-play. We introduce a dataset with over 2,000 characters and 72,000 interviews, including 18,000 adversarial questions. We propose RoleFact, a role-playing method that mitigates hallucination by modulating the influence of parametric knowledge.
arXiv Detail & Related papers (2024-06-25T03:56:33Z)
CHIRON: Rich Character Representations in Long-Form Narratives [98.273323001781]
We propose CHIRON, a new character sheet' based representation that organizes and filters textual information about characters. We validate CHIRON via the downstream task of masked-character prediction, where our experiments show CHIRON is better and more flexible than comparable summary-based baselines. metrics derived from CHIRON can be used to automatically infer character-centricity in stories, and that these metrics align with human judgments.
arXiv Detail & Related papers (2024-06-14T17:23:57Z)
Enhancing Consistency and Role-Specific Knowledge Capturing by Rebuilding Fictional Character's Persona [6.220415006158471]
Assistants API often fails to achieve with its search because the information extraction part is different each time. It is hard to maintain a consistent persona simply by using the persona document as input to the Assistants API. CharacterGPT is a novel persona reconstruction framework to alleviate the shortcomings of the Assistants API.
arXiv Detail & Related papers (2024-05-30T07:44:16Z)
Character-LLM: A Trainable Agent for Role-Playing [67.35139167985008]
Large language models (LLMs) can be used to serve as agents to simulate human behaviors. We introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc.
arXiv Detail & Related papers (2023-10-16T07:58:56Z)
NarrativePlay: Interactive Narrative Understanding [27.440721435864194]
We introduce NarrativePlay, a novel system that allows users to role-play a fictional character and interact with other characters in narratives in an immersive environment. We leverage Large Language Models (LLMs) to generate human-like responses, guided by personality traits extracted from narratives. NarrativePlay has been evaluated on two types of narratives, detective and adventure stories, where users can either explore the world or improve their favorability with the narrative characters through conversations.
arXiv Detail & Related papers (2023-10-02T13:24:00Z)
"Let Your Characters Tell Their Story": A Dataset for Character-Centric Narrative Understanding [31.803481510886378]
We present LiSCU -- a new dataset of literary pieces and their summaries paired with descriptions of characters that appear in them. We also introduce two new tasks on LiSCU: Character Identification and Character Description Generation. Our experiments with several pre-trained language models adapted for these tasks demonstrate that there is a need for better models of narrative comprehension.
arXiv Detail & Related papers (2021-09-12T06:12:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.