Related papers: Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents

Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents

URL: http://arxiv.org/abs/2503.08193v1
Date: Tue, 11 Mar 2025 08:57:07 GMT
Title: Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents
Authors: Rui Xu, MingYu Wang, XinTao Wang, Dakuan Lu, Xiaoyu Tan, Wei Chu, Yinghui Xu,
Abstract summary: Internal thinking processes of role-playing language agents (RPLAs) remain unexplored.<n>We introduce ROLETHINK, a novel benchmark constructed from literature for evaluating character thought generation.<n>We propose MIRROR, a chain-of-thought approach that generates character thoughts by retrieving memories, predicting character reactions, and synthesizing motivations.
Score: 48.52216655094884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in LLM-based role-playing language agents (RPLAs) have attracted broad attention in various applications. While chain-of-thought reasoning has shown importance in many tasks for LLMs, the internal thinking processes of RPLAs remain unexplored. Understanding characters' inner thoughts is crucial for developing advanced RPLAs. In this paper, we introduce ROLETHINK, a novel benchmark constructed from literature for evaluating character thought generation. We propose the task of inner thought reasoning, which includes two sets: the gold set that compares generated thoughts with original character monologues, and the silver set that uses expert synthesized character analyses as references. To address this challenge, we propose MIRROR, a chain-of-thought approach that generates character thoughts by retrieving memories, predicting character reactions, and synthesizing motivations. Through extensive experiments, we demonstrate the importance of inner thought reasoning for RPLAs, and MIRROR consistently outperforms existing methods. Resources are available at https://github.com/airaer1998/RPA_Thought.

Related papers

Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning [46.47940531288568]
This paper introduces a novel Role-Aware Reasoning (RAR) method, which consists of two important stages: Role Identity Activation (RIA) and Reasoning Style Optimization (RSO)<n>RIA explicitly guides the model with character profiles during reasoning to counteract attention diversion, and then RSO aligns reasoning style with the character and scene via LRM distillation to mitigate style drift.
arXiv Detail & Related papers (2025-06-02T14:55:04Z)
Assesing LLMs in Art Contexts: Critique Generation and Theory of Mind Evaluation [0.9428222284377783]
This study explores how large language models (LLMs) perform in two areas related to art. For the critique generation part, we built a system that combines Noel Carroll's evaluative framework with a broad selection of art criticism theories. These critiques were compared with those written by human experts in a Turing test-style evaluation. In the second part, we introduced new simple ToM tasks based on situations involving interpretation, emotion, and moral tension.
arXiv Detail & Related papers (2025-04-17T10:10:25Z)
Collaborative Storytelling and LLM: A Linguistic Analysis of Automatically-Generated Role-Playing Game Sessions [55.2480439325792]
Role-playing games (RPG) are games in which players interact with one another to create narratives. This emerging form of shared narrative, primarily oral, is receiving increasing attention. In this paper, we aim to discover to what extent the language of Large Language Models (LLMs) exhibit oral or written features when asked to generate an RPG session.
arXiv Detail & Related papers (2025-03-26T15:10:47Z)
Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs [50.0874045899661]
We introduce CharacterBot, a model designed to replicate both the linguistic patterns and distinctive thought processes of a character.<n>Using Lu Xun as a case study, we propose four training tasks derived from his 17 essay collections.<n>These include a pre-training task focused on mastering external linguistic structures and knowledge, as well as three fine-tuning tasks.<n>We evaluate CharacterBot on three tasks for linguistic accuracy and opinion comprehension, demonstrating that it significantly outperforms the baselines on our adapted metrics.
arXiv Detail & Related papers (2025-02-18T16:11:54Z)
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind [25.524355451378593]
ToMATO is a new ToM benchmark formulated as multiple-choice QA over conversations. We capture both first- and second-order mental states across five categories: belief, intention, desire, emotion, and knowledge. ToMATO consists of 5.4k questions, 753 conversations, and 15 personality trait patterns.
arXiv Detail & Related papers (2025-01-15T14:47:02Z)
The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters [67.61587661660852]
Theory-of-Mind (ToM) allows humans to understand and interpret the mental states of others. In this paper, we verify the importance of comprehensive contextual understanding about personal backgrounds in ToM. We introduce CharToM benchmark, comprising 1,035 ToM questions based on characters from classic novels.
arXiv Detail & Related papers (2025-01-03T09:04:45Z)
Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning [0.0]
Iterative human engagement is a common and effective means of leveraging the advanced language processing power of large language models (LLMs) We propose the Iteration of Thought (IoT) framework for enhancing LLM responses by generating "thought"-provoking prompts. Unlike static or semi-static approaches, IoT adapts its reasoning path dynamically, based on evolving context.
arXiv Detail & Related papers (2024-09-19T09:44:17Z)
Thinking Before Speaking: A Role-playing Model with Mindset [0.6428333375712125]
Large Language Models (LLMs) are skilled at simulating human behaviors. These models tend to perform poorly when confronted with knowledge that the assumed role does not possess. We propose a Thinking Before Speaking (TBS) model in this paper.
arXiv Detail & Related papers (2024-09-14T02:41:48Z)
The Drama Machine: Simulating Character Development with LLM Agents [1.999925939110439]
This paper explores use of multiple large language model (LLM) agents to simulate complex, dynamic characters in dramatic scenarios. We introduce a drama machine framework that coordinates interactions between LLM agents playing different 'Ego' and 'Superego' psychological roles. Results suggest this multi-agent approach can produce more nuanced, adaptive narratives that evolve over a sequence of dialogical turns.
arXiv Detail & Related papers (2024-08-03T09:40:26Z)
Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation. The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales. We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.