Related papers: Better Zero-Shot Reasoning with Role-Play Prompting

Better Zero-Shot Reasoning with Role-Play Prompting

URL: http://arxiv.org/abs/2308.07702v2
Date: Thu, 14 Mar 2024 02:07:11 GMT
Title: Better Zero-Shot Reasoning with Role-Play Prompting
Authors: Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, Xiaohang Dong,
Abstract summary: Role-play prompting consistently surpasses the standard zero-shot approach across most datasets. This highlights its potential to augment the reasoning capabilities of large language models.
Score: 10.90357246745529
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern large language models (LLMs) exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced novel modes of interaction, the influence of role-playing on LLMs' reasoning abilities remains underexplored. In this study, we introduce a strategically designed role-play prompting methodology and assess its performance under the zero-shot setting across twelve diverse reasoning benchmarks. Our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, in experiments conducted using ChatGPT, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%.Upon further comparison with the Zero-Shot-CoT technique, which prompts the model to "think step by step", our study demonstrates that role-play prompting acts as a more effective trigger for the CoT process. This highlights its potential to augment the reasoning capabilities of LLMs. We release our code at https://github.com/NKU-HLT/Role-Play-Prompting.

Related papers

RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing [111.06936588273868]
RMTBench is a comprehensive textbfuser-centric bilingual role-playing benchmark featuring 80 diverse characters and over 8,000 dialogue rounds.<n>Our benchmark constructs dialogues based on explicit user motivations rather than character descriptions, ensuring alignment with practical user applications.<n>By shifting focus from character background to user intention fulfillment, RMTBench bridges the gap between academic evaluation and practical deployment requirements.
arXiv Detail & Related papers (2025-07-27T16:49:47Z)
Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent [18.67432557362308]
Test-Time-Matching (TTM) is a training-free role-playing framework through test-time scaling and context engineering.<n>Our framework involves a structured, three-stage generation pipeline that utilizes these features for controlled role-playing.<n>It achieves high-fidelity role-playing performance, also enables seamless combinations across diverse linguistic styles and even variations in personality and memory.
arXiv Detail & Related papers (2025-07-22T17:47:44Z)
Improving LLM Reasoning through Interpretable Role-Playing Steering [23.75554062102392]
Role-playing has emerged as an effective technique for enhancing the reasoning capabilities of large language models (LLMs)<n>We introduce Sparse Autoencoder Role-Playing Steering (SRPS), a novel framework that identifies and manipulates internal model features associated with role-playing behavior.<n>Our approach extracts latent representations from role-play prompts, selects the most relevant features based on activation patterns, and constructs a steering vector that can be injected into the model's residual stream with controllable intensity.
arXiv Detail & Related papers (2025-06-09T00:31:17Z)
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs [55.8331366739144]
We introduce LIFESTATE-BENCH, a benchmark designed to assess lifelong learning in large language models (LLMs) Our fact checking evaluation probes models' self-awareness, episodic memory retrieval, and relationship tracking, across both parametric and non-parametric approaches.
arXiv Detail & Related papers (2025-03-30T16:50:57Z)
Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning [7.836439251883518]
We propose a novel framework named textbfunderlinePersona-Aware textbfunderlineContrastive textbfunderlineLearning (PCL) to align model role-playing behavior. We show that PCL significantly outperform vanilla LLMs under automatic evaluation methods and human expert evaluation.
arXiv Detail & Related papers (2025-03-22T06:12:34Z)
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles [62.886267684392635]
CoSER dataset covers 17,966 characters from 771 renowned books. We develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models.
arXiv Detail & Related papers (2025-02-13T08:55:24Z)
Law of the Weakest Link: Cross Capabilities of Large Language Models [102.91861246827797]
We show that Large Language Models (LLMs) exhibit the "Law of the Weakest Link," where cross-capability performance is significantly constrained by the weakest component. These results highlight the under-performance of LLMs in cross-capability tasks.
arXiv Detail & Related papers (2024-09-30T05:12:01Z)
How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test. This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses. We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z)
Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions? [59.0123596591807]
We benchmark the ability of Large Language Models (LLMs) in persona-driven decision-making. We investigate whether LLMs can predict characters' decisions provided by the preceding stories in high-quality novels. The results demonstrate that state-of-the-art LLMs exhibit promising capabilities in this task, yet substantial room for improvement remains.
arXiv Detail & Related papers (2024-04-18T12:40:59Z)
uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers? [7.0546788281657875]
We investigate how different prompting methods enhance LLMs' performance on a task to reveal their inherent power for outside-the-box thinking ability. We generate a dataset of thinking paths between riddles and options using GPT-4, validated by humans for quality. Findings indicate that compressed informative prompts enhance performance significantly.
arXiv Detail & Related papers (2024-04-03T05:31:59Z)
Characteristic AI Agents via Large Language Models [40.10858767752735]
This research focuses on investigating the performance of Large Language Models in constructing characteristic AI agents. A dataset called Character100'' is built for this benchmark, comprising the most-visited people on Wikipedia for language models to role-play. The experimental results underscore the potential directions for further improvement in the capabilities of LLMs in constructing characteristic AI agents.
arXiv Detail & Related papers (2024-03-19T02:25:29Z)
Large Language Models are Contrastive Reasoners [8.427805316635318]
We show how contrastive prompting significantly improves the ability of large language models to perform complex reasoning. Experiments on various large language models show that zero-shot contrastive prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. Our method not only surpasses zero-shot CoT and few-shot CoT in most arithmetic and commonsense reasoning tasks but also can seamlessly integrate with existing prompting methods.
arXiv Detail & Related papers (2024-03-13T03:15:05Z)
CogBench: a large language model walks into a psychology lab [12.981407327149679]
This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. We apply CogBench to 35 large language models (LLMs) and analyze this data using statistical multilevel modeling techniques. We find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior.
arXiv Detail & Related papers (2024-02-28T10:43:54Z)
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment [62.898963074989766]
We introduce Ditto, a self-alignment method for role-play. This method creates a role-play training set comprising 4,000 characters, surpassing the scale of currently available datasets by tenfold. We present the first comprehensive cross-supervision alignment experiment in the role-play domain.
arXiv Detail & Related papers (2024-01-23T03:56:22Z)
Tool-Augmented Reward Modeling [58.381678612409]
We propose a tool-augmented preference modeling approach, named Themis, to address limitations by empowering RMs with access to external environments. Our study delves into the integration of external tools into RMs, enabling them to interact with diverse external sources. In human evaluations, RLHF trained with Themis attains an average win rate of 32% when compared to baselines.
arXiv Detail & Related papers (2023-10-02T09:47:40Z)
Text Classification via Large Language Models [63.1874290788797]
We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks. More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
arXiv Detail & Related papers (2023-05-15T06:24:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.