CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
- URL: http://arxiv.org/abs/2412.05631v1
- Date: Sat, 07 Dec 2024 12:09:35 GMT
- Title: CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
- Authors: Lei Wang, Jianxun Lian, Yi Huang, Yanqi Dai, Haoxuan Li, Xu Chen, Xing Xie, Ji-Rong Wen,
- Abstract summary: Role-playing is a crucial capability of Large Language Models (LLMs)<n>Current evaluation methods fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing.<n>We propose CharacterBox, a simulation sandbox designed to generate situational fine-grained character behavior trajectories.
- Score: 74.02480671181685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Role-playing is a crucial capability of Large Language Models (LLMs), enabling a wide range of practical applications, including intelligent non-player characters, digital twins, and emotional companions. Evaluating this capability in LLMs is challenging due to the complex dynamics involved in role-playing, such as maintaining character fidelity throughout a storyline and navigating open-ended narratives without a definitive ground truth. Current evaluation methods, which primarily focus on question-answering or conversational snapshots, fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing. In this paper, we propose CharacterBox, which is a simulation sandbox designed to generate situational fine-grained character behavior trajectories. These behavior trajectories enable a more comprehensive and in-depth evaluation of role-playing capabilities. CharacterBox consists of two main components: the character agent and the narrator agent. The character agent, grounded in psychological and behavioral science, exhibits human-like behaviors, while the narrator agent coordinates interactions between character agents and environmental changes. Additionally, we introduce two trajectory-based methods that leverage CharacterBox to enhance LLM performance. To reduce costs and facilitate the adoption of CharacterBox by public communities, we fine-tune two smaller models, CharacterNR and CharacterRM, as substitutes for GPT API calls, and demonstrate their competitive performance compared to advanced GPT APIs.
Related papers
- Towards Enhanced Immersion and Agency for LLM-based Interactive Drama [55.770617779283064]
This paper begins with understanding interactive drama from two aspects: Immersion, the player's feeling of being present in the story, and Agency.
To enhance these two aspects, we first propose Playwriting-guided Generation, a novel method that helps LLMs craft dramatic stories with substantially improved structures and narrative quality.
arXiv Detail & Related papers (2025-02-25T06:06:16Z) - CharacterBench: Benchmarking Character Customization of Large Language Models [80.29164862682063]
We propose CharacterBench, the largest bilingual generative benchmark, with 22,859 human-annotated samples covering 3,956 characters.
We define 11 dimensions of 6 aspects, classified as sparse and dense dimensions based on whether character features evaluated by specific dimensions manifest in each response.
We also develop CharacterJudge model for cost-effective and stable evaluations.
arXiv Detail & Related papers (2024-12-16T15:55:34Z) - What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models [0.0]
We introduce a dialogue filler framework that utilizes large language models (LLMs) to generate dynamic and contextually appropriate character interactions.
We test this framework within the environments of Final Fantasy VII Remake and Pokemon.
This study aims to assist developers in crafting more nuanced filler dialogues, thereby enriching player immersion and enhancing the overall RPG experience.
arXiv Detail & Related papers (2024-07-29T19:12:18Z) - Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data [58.92110996840019]
We propose to enhance role-playing language models (RPLMs) via personality-indicative data.
Specifically, we leverage questions from psychological scales and distill advanced RPAs to generate dialogues that grasp the minds of characters.
Experimental results validate that RPLMs trained with our dataset exhibit advanced role-playing capabilities for both general and personality-related evaluations.
arXiv Detail & Related papers (2024-06-27T06:24:00Z) - Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework [29.166067413153353]
Large Language Models (LLMs) can comprehend human instructions and generate high-quality text.
We introduce the Customisable Conversation Agent Framework, which leverages LLMs to simulate real-world characters.
We present SimsChat, a freely customisable role-playing agent.
arXiv Detail & Related papers (2024-06-25T22:44:17Z) - CharacterGPT: A Persona Reconstruction Framework for Role-Playing Agents [6.220415006158471]
We introduce CharacterGPT, a framework designed to dynamically reconstruct character personas through Character Persona Training (CPT)
This approach incrementally updates personas by extracting traits from chapter-wise novel summaries, reflecting the progression of the narrative.
Our framework is evaluated through Big Five personality evaluations and creative tasks, in which characters generate original narratives.
arXiv Detail & Related papers (2024-05-30T07:44:16Z) - RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models [6.753588449962107]
RoleCraft-GLM is an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs)
We contribute a unique conversational dataset that shifts from conventional celebrity-centric characters to diverse, non-celebrity personas.
Our approach includes meticulous character development, ensuring dialogues are both realistic and emotionally resonant.
arXiv Detail & Related papers (2023-12-17T17:57:50Z) - CharacterGLM: Customizing Chinese Conversational AI Characters with
Large Language Models [66.4382820107453]
We present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters.
Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs.
arXiv Detail & Related papers (2023-11-28T14:49:23Z) - Character-LLM: A Trainable Agent for Role-Playing [67.35139167985008]
Large language models (LLMs) can be used to serve as agents to simulate human behaviors.
We introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc.
arXiv Detail & Related papers (2023-10-16T07:58:56Z) - Better Zero-Shot Reasoning with Role-Play Prompting [10.90357246745529]
Role-play prompting consistently surpasses the standard zero-shot approach across most datasets.
This highlights its potential to augment the reasoning capabilities of large language models.
arXiv Detail & Related papers (2023-08-15T11:08:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.