Related papers: RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following

RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following

URL: http://arxiv.org/abs/2502.11387v1
Date: Mon, 17 Feb 2025 03:08:37 GMT
Title: RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following
Authors: Junru Lu, Jiazheng Li, Guodong Shen, Lin Gui, Siyu An, Yulan He, Di Yin, Xing Sun,
Abstract summary: Role-playing is important for Large Language Models to follow diverse instructions.<n>Existing role-playing datasets mostly contribute to controlling role style and knowledge boundaries.<n>We introduce a fine-grained role-playing and instruction-following benchmark, named RoleMRC.
Score: 31.80357046048002
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Role-playing is important for Large Language Models (LLMs) to follow diverse instructions while maintaining role identity and the role's pre-defined ability limits. Existing role-playing datasets mostly contribute to controlling role style and knowledge boundaries, but overlook role-playing in instruction-following scenarios. We introduce a fine-grained role-playing and instruction-following composite benchmark, named RoleMRC, including: (1) Multi-turn dialogues between ideal roles and humans, including free chats or discussions upon given passages; (2) Role-playing machine reading comprehension, involving response, refusal, and attempts according to passage answerability and role ability; (3) More complex scenarios with nested, multi-turn and prioritized instructions. The final RoleMRC features a 10.2k role profile meta-pool, 37.9k well-synthesized role-playing instructions, and 1.4k testing samples. We develop a pipeline to quantitatively evaluate the fine-grained role-playing and instruction-following capabilities of several mainstream LLMs, as well as models that are fine-tuned on our data. Moreover, cross-evaluation on external role-playing datasets confirms that models fine-tuned on RoleMRC enhances instruction-following without compromising general role-playing and reasoning capabilities. We also probe the neural-level activation maps of different capabilities over post-tuned LLMs. Access to our RoleMRC, RoleMRC-mix and Codes: https://github.com/LuJunru/RoleMRC.

Related papers

AdaMARP: An Adaptive Multi-Agent Interaction Framework for General Immersive Role-Playing [71.66362858228418]
LLM role-playing aims to portray arbitrary characters in interactive narratives, yet existing systems often suffer from limited immersion and adaptability.<n>We propose an adaptive multi-agent role-playing framework, AdaMARP, featuring an immersive message format that interleaves [Thought], (Action), Environment>, and Speech.
arXiv Detail & Related papers (2026-01-16T05:41:45Z)
How role-play shapes relevance judgment in zero-shot LLM rankers [15.11127856890218]
Large Language Models (LLMs) have emerged as promising zero-shot rankers.<n>Their performance is highly sensitive to prompt formulation.<n>In particular, role-play prompts, where the model is assigned a functional role or identity, often give more robust and accurate relevance rankings.
arXiv Detail & Related papers (2025-10-20T13:39:48Z)
SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents [52.29009595100625]
Role-playing agents have emerged as a promising paradigm for achieving personalized interaction and emotional resonance.<n>Existing research primarily focuses on the textual modality, neglecting the critical dimension of speech in realistic interactive scenarios.<n>We construct SpeechRole-Data, a large-scale, high-quality dataset that comprises 98 diverse roles and 112k speech-based single-turn and multi-turn conversations.
arXiv Detail & Related papers (2025-08-04T03:18:36Z)
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing [111.06936588273868]
RMTBench is a comprehensive textbfuser-centric bilingual role-playing benchmark featuring 80 diverse characters and over 8,000 dialogue rounds.<n>Our benchmark constructs dialogues based on explicit user motivations rather than character descriptions, ensuring alignment with practical user applications.<n>By shifting focus from character background to user intention fulfillment, RMTBench bridges the gap between academic evaluation and practical deployment requirements.
arXiv Detail & Related papers (2025-07-27T16:49:47Z)
Reasoning Does Not Necessarily Improve Role-Playing Ability [46.441264660062195]
The application of role-playing large language models (LLMs) is rapidly expanding in both academic and commercial domains. We compare the effectiveness of direct zero-shot role-playing, role-playing with Chain-of-Thought (CoT), and role-playing using reasoning-optimized LLMs. Our findings reveal that CoT may reduce role-playing performance, reasoning-optimized LLMs are unsuitable for role-playing, and Chinese role-playing performance surpasses English role-playing performance.
arXiv Detail & Related papers (2025-02-24T08:08:41Z)
Thinking Before Speaking: A Role-playing Model with Mindset [0.6428333375712125]
Large Language Models (LLMs) are skilled at simulating human behaviors. These models tend to perform poorly when confronted with knowledge that the assumed role does not possess. We propose a Thinking Before Speaking (TBS) model in this paper.
arXiv Detail & Related papers (2024-09-14T02:41:48Z)
RNR: Teaching Large Language Models to Follow Roles and Rules [153.6596303205894]
We propose model, an automated data generation pipeline that generates diverse roles and rules from existing IFT instructions. This data can then be used to train models that follow complex system prompts. Our framework significantly improves role and rule following capability in large language models.
arXiv Detail & Related papers (2024-09-10T06:07:32Z)
Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data [58.92110996840019]
We propose to enhance role-playing language models (RPLMs) via personality-indicative data. Specifically, we leverage questions from psychological scales and distill advanced RPAs to generate dialogues that grasp the minds of characters. Experimental results validate that RPLMs trained with our dataset exhibit advanced role-playing capabilities for both general and personality-related evaluations.
arXiv Detail & Related papers (2024-06-27T06:24:00Z)
Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement [17.5855800570993]
Large Language Models (LLMs) have propelled dialogue generation into new realms, particularly in the field of role-playing systems (RPSs) Existing LLM-based RPSs still struggle to align with roles when handling intricate and trapped queries in boundary scenarios. We design the Modular ORchestrated Trap-setting Interaction SystEm (MORTISE) to benchmark and improve the role-playing LLMs' performance.
arXiv Detail & Related papers (2024-02-16T12:12:05Z)
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment [62.898963074989766]
We introduce Ditto, a self-alignment method for role-play. This method creates a role-play training set comprising 4,000 characters, surpassing the scale of currently available datasets by tenfold. We present the first comprehensive cross-supervision alignment experiment in the role-play domain.
arXiv Detail & Related papers (2024-01-23T03:56:22Z)
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models [107.00832724504752]
We introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in Large Language Models (LLMs) By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples.
arXiv Detail & Related papers (2023-10-01T17:52:59Z)
About latent roles in forecasting players in team sports [47.066729480128856]
Team sports contain a significant social component that influences interactions between teammates and opponents. We create RolFor, a novel end-to-end model for Role-based Forecasting.
arXiv Detail & Related papers (2023-04-17T13:33:23Z)
RODE: Learning Roles to Decompose Multi-Agent Tasks [69.56458960841165]
Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles. We propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents. By virtue of these advances, our method outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark.
arXiv Detail & Related papers (2020-10-04T09:20:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.