RoleInteract: Evaluating the Social Interaction of Role-Playing Agents
- URL: http://arxiv.org/abs/2403.13679v3
- Date: Fri, 22 Mar 2024 00:49:59 GMT
- Title: RoleInteract: Evaluating the Social Interaction of Role-Playing Agents
- Authors: Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Xing Gao, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang, Jingren Zhou,
- Abstract summary: We introduce the first benchmark designed to evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions.
The benchmark is constructed from a variety of sources and covers a wide range of 500 characters and over 6,000 question prompts.
We find that agents excelling in individual level does not imply their proficiency in group level.
- Score: 85.6641890712617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing conversational agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancing the conversational capability, role-specific knowledge, and stylistic attributes of these agents, there has been a noticeable gap in assessing their social intelligence. In this paper, we introduce RoleInteract, the first benchmark designed to systematically evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions. The benchmark is constructed from a variety of sources and covers a wide range of 500 characters and over 6,000 question prompts and 30,800 multi-turn role-playing utterances. We conduct comprehensive evaluations on this benchmark using mainstream open-source and closed-source LLMs. We find that agents excelling in individual level does not imply their proficiency in group level. Moreover, the behavior of individuals may drift as a result of the influence exerted by other agents within the group. Experimental results on RoleInteract confirm its significance as a testbed for assessing the social interaction of role-playing conversational agents. The benchmark is publicly accessible at https://github.com/X-PLUG/RoleInteract.
Related papers
- PersLLM: A Personified Training Approach for Large Language Models [63.75008885222351]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - CloChat: Understanding How People Customize, Interact, and Experience
Personas in Large Language Models [15.915071948354466]
CloChat is an interface supporting easy and accurate customization of agent personas in large language models.
Results indicate that participants formed emotional bonds with the customized agents, engaged in more dynamic dialogues, and showed interest in sustaining interactions.
arXiv Detail & Related papers (2024-02-23T11:25:17Z) - Unveiling the Secrets of Engaging Conversations: Factors that Keep Users
Hooked on Role-Playing Dialog Agents [17.791787477586574]
The degree to which the bot embodies the roles it plays has limited influence on retention rates, while the length of each turn it speaks significantly affects retention rates.
This study sheds light on the critical aspects of user engagement with role-playing models and provides valuable insights for future improvements in the development of large language models for role-playing purposes.
arXiv Detail & Related papers (2024-02-18T09:42:41Z) - LLM Agents in Interaction: Measuring Personality Consistency and
Linguistic Alignment in Interacting Populations of Large Language Models [4.706971067968811]
We create a two-group population of large language models (LLMs) agents using a simple variability-inducing sampling algorithm.
We administer personality tests and submit the agents to a collaborative writing task, finding that different profiles exhibit different degrees of personality consistency and linguistic alignment to their conversational partners.
arXiv Detail & Related papers (2024-02-05T11:05:20Z) - Large Language Models are Superpositions of All Characters: Attaining
Arbitrary Role-play via Self-Alignment [62.898963074989766]
We introduce Ditto, a self-alignment method for role-play.
This method creates a role-play training set comprising 4,000 characters, surpassing the scale of currently available datasets by tenfold.
We present the first comprehensive cross-supervision alignment experiment in the role-play domain.
arXiv Detail & Related papers (2024-01-23T03:56:22Z) - AntEval: Evaluation of Social Interaction Competencies in LLM-Driven
Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios.
However, their capability in handling complex, multi-character social interactions has yet to be fully explored.
We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z) - Better Zero-Shot Reasoning with Role-Play Prompting [10.90357246745529]
Role-play prompting consistently surpasses the standard zero-shot approach across most datasets.
This highlights its potential to augment the reasoning capabilities of large language models.
arXiv Detail & Related papers (2023-08-15T11:08:30Z) - Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via
Role Interactions [50.84439853121438]
We propose a novel role interaction enhanced method for role-oriented dialogue summarization.
It adopts cross attention and decoder self-attention interactions to interactively acquire other roles' critical information.
Our proposed method significantly outperforms strong baselines on two public role-oriented dialogue summarization datasets.
arXiv Detail & Related papers (2022-05-26T06:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.