CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
- URL: http://arxiv.org/abs/2412.05631v1
- Date: Sat, 07 Dec 2024 12:09:35 GMT
- Title: CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
- Authors: Lei Wang, Jianxun Lian, Yi Huang, Yanqi Dai, Haoxuan Li, Xu Chen, Xing Xie, Ji-Rong Wen,
- Abstract summary: Role-playing is a crucial capability of Large Language Models (LLMs)
Current evaluation methods fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing.
We propose CharacterBox, a simulation sandbox designed to generate situational fine-grained character behavior trajectories.
- Score: 74.02480671181685
- License:
- Abstract: Role-playing is a crucial capability of Large Language Models (LLMs), enabling a wide range of practical applications, including intelligent non-player characters, digital twins, and emotional companions. Evaluating this capability in LLMs is challenging due to the complex dynamics involved in role-playing, such as maintaining character fidelity throughout a storyline and navigating open-ended narratives without a definitive ground truth. Current evaluation methods, which primarily focus on question-answering or conversational snapshots, fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing. In this paper, we propose CharacterBox, which is a simulation sandbox designed to generate situational fine-grained character behavior trajectories. These behavior trajectories enable a more comprehensive and in-depth evaluation of role-playing capabilities. CharacterBox consists of two main components: the character agent and the narrator agent. The character agent, grounded in psychological and behavioral science, exhibits human-like behaviors, while the narrator agent coordinates interactions between character agents and environmental changes. Additionally, we introduce two trajectory-based methods that leverage CharacterBox to enhance LLM performance. To reduce costs and facilitate the adoption of CharacterBox by public communities, we fine-tune two smaller models, CharacterNR and CharacterRM, as substitutes for GPT API calls, and demonstrate their competitive performance compared to advanced GPT APIs.
Related papers
- CharacterBench: Benchmarking Character Customization of Large Language Models [80.29164862682063]
We propose CharacterBench, the largest bilingual generative benchmark, with 22,859 human-annotated samples covering 3,956 characters.
We define 11 dimensions of 6 aspects, classified as sparse and dense dimensions based on whether character features evaluated by specific dimensions manifest in each response.
We also develop CharacterJudge model for cost-effective and stable evaluations.
arXiv Detail & Related papers (2024-12-16T15:55:34Z) - What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models [0.0]
We introduce a dialogue filler framework that utilizes large language models (LLMs) to generate dynamic and contextually appropriate character interactions.
We test this framework within the environments of Final Fantasy VII Remake and Pokemon.
This study aims to assist developers in crafting more nuanced filler dialogues, thereby enriching player immersion and enhancing the overall RPG experience.
arXiv Detail & Related papers (2024-07-29T19:12:18Z) - Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data [58.92110996840019]
We propose to enhance role-playing language models (RPLMs) via personality-indicative data.
Specifically, we leverage questions from psychological scales and distill advanced RPAs to generate dialogues that grasp the minds of characters.
Experimental results validate that RPLMs trained with our dataset exhibit advanced role-playing capabilities for both general and personality-related evaluations.
arXiv Detail & Related papers (2024-06-27T06:24:00Z) - Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework [29.166067413153353]
Large Language Models (LLMs) demonstrate remarkable ability to comprehend instructions and generate human-like text.
We introduce the Customisable Conversation Agent Framework, which employs LLMs to simulate real-world characters.
We present SimsChat, a freely customisable role-playing agent incorporating various realistic settings.
arXiv Detail & Related papers (2024-06-25T22:44:17Z) - CharacterGPT: A Persona Reconstruction Framework for Role-Playing Agents [6.220415006158471]
Assistants API often fails to achieve with its search because the information extraction part is different each time.
It is hard to maintain a consistent persona simply by using the persona document as input to the Assistants API.
CharacterGPT is a novel persona reconstruction framework to alleviate the shortcomings of the Assistants API.
arXiv Detail & Related papers (2024-05-30T07:44:16Z) - Large Language Models are Superpositions of All Characters: Attaining
Arbitrary Role-play via Self-Alignment [62.898963074989766]
We introduce Ditto, a self-alignment method for role-play.
This method creates a role-play training set comprising 4,000 characters, surpassing the scale of currently available datasets by tenfold.
We present the first comprehensive cross-supervision alignment experiment in the role-play domain.
arXiv Detail & Related papers (2024-01-23T03:56:22Z) - RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models [6.753588449962107]
RoleCraft-GLM is an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs)
We contribute a unique conversational dataset that shifts from conventional celebrity-centric characters to diverse, non-celebrity personas.
Our approach includes meticulous character development, ensuring dialogues are both realistic and emotionally resonant.
arXiv Detail & Related papers (2023-12-17T17:57:50Z) - CharacterGLM: Customizing Chinese Conversational AI Characters with
Large Language Models [66.4382820107453]
We present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters.
Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs.
arXiv Detail & Related papers (2023-11-28T14:49:23Z) - Character-LLM: A Trainable Agent for Role-Playing [67.35139167985008]
Large language models (LLMs) can be used to serve as agents to simulate human behaviors.
We introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc.
arXiv Detail & Related papers (2023-10-16T07:58:56Z) - NarrativePlay: Interactive Narrative Understanding [27.440721435864194]
We introduce NarrativePlay, a novel system that allows users to role-play a fictional character and interact with other characters in narratives in an immersive environment.
We leverage Large Language Models (LLMs) to generate human-like responses, guided by personality traits extracted from narratives.
NarrativePlay has been evaluated on two types of narratives, detective and adventure stories, where users can either explore the world or improve their favorability with the narrative characters through conversations.
arXiv Detail & Related papers (2023-10-02T13:24:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.