RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
- URL: http://arxiv.org/abs/2310.00746v3
- Date: Tue, 18 Jun 2024 13:08:24 GMT
- Title: RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
- Authors: Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Stephen W. Huang, Jie Fu, Junran Peng,
- Abstract summary: We introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in Large Language Models (LLMs)
By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples.
- Score: 107.00832724504752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).
Related papers
- RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following [31.80357046048002]
Role-playing is important for Large Language Models to follow diverse instructions.
Existing role-playing datasets mostly contribute to controlling role style and knowledge boundaries.
We introduce a fine-grained role-playing and instruction-following benchmark, named RoleMRC.
arXiv Detail & Related papers (2025-02-17T03:08:37Z) - CoSER: Coordinating LLM-Based Persona Simulation of Established Roles [62.886267684392635]
CoSER dataset covers 17,966 characters from 771 renowned books.
We develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models.
arXiv Detail & Related papers (2025-02-13T08:55:24Z) - OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas [65.83634577897564]
This study explores a large-scale data synthesis approach to equip large language models with character generalization capabilities.
We begin by synthesizing large-scale character profiles using personas from Persona Hub.
We then explore two strategies: response rewriting and response generation, to create character-aligned instructional responses.
arXiv Detail & Related papers (2025-01-26T07:07:01Z) - RNR: Teaching Large Language Models to Follow Roles and Rules [153.6596303205894]
We propose model, an automated data generation pipeline that generates diverse roles and rules from existing IFT instructions.
This data can then be used to train models that follow complex system prompts.
Our framework significantly improves role and rule following capability in large language models.
arXiv Detail & Related papers (2024-09-10T06:07:32Z) - Role-playing Prompt Framework: Generation and Evaluation [3.2845546753303867]
Large language models (LLMs) exhibit impressive proficiency in natural language generation, understanding user instructions, and emulating human-like language use.
This paper introduces a prompt-based framework designed to leverage GPT's capabilities for the generation of role-playing dialogue datasets.
arXiv Detail & Related papers (2024-06-02T06:09:56Z) - On the Decision-Making Abilities in Role-Playing using Large Language
Models [6.550638804145713]
Large language models (LLMs) are increasingly utilized for role-playing tasks.
This paper focuses on evaluating the decision-making abilities of LLMs post role-playing.
arXiv Detail & Related papers (2024-02-29T02:22:23Z) - Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement [17.5855800570993]
Large Language Models (LLMs) have propelled dialogue generation into new realms, particularly in the field of role-playing systems (RPSs)
Existing LLM-based RPSs still struggle to align with roles when handling intricate and trapped queries in boundary scenarios.
We design the Modular ORchestrated Trap-setting Interaction SystEm (MORTISE) to benchmark and improve the role-playing LLMs' performance.
arXiv Detail & Related papers (2024-02-16T12:12:05Z) - Large Language Models are Superpositions of All Characters: Attaining
Arbitrary Role-play via Self-Alignment [62.898963074989766]
We introduce Ditto, a self-alignment method for role-play.
This method creates a role-play training set comprising 4,000 characters, surpassing the scale of currently available datasets by tenfold.
We present the first comprehensive cross-supervision alignment experiment in the role-play domain.
arXiv Detail & Related papers (2024-01-23T03:56:22Z) - RODE: Learning Roles to Decompose Multi-Agent Tasks [69.56458960841165]
Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles.
We propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents.
By virtue of these advances, our method outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark.
arXiv Detail & Related papers (2020-10-04T09:20:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.