Related papers: AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents

AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents

URL: http://arxiv.org/abs/2401.06509v3
Date: Tue, 5 Mar 2024 12:07:04 GMT
Title: AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents
Authors: Yuanzhi Liang, Linchao Zhu, Yi Yang
Abstract summary: Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored. We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
Score: 65.16893197330589
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored, primarily due to the absence of robust, quantitative evaluation methods. This gap has slowed the development of agents proficient in more nuanced interactions beyond simple exchanges, for example, small talk. To address this challenge, we introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods. The interaction framework aims to foster an complex interaction environment that bolsters information exchange and intention expression within social interactions. Furthermore, we introduce evaluation methods, including two metrics: Information Exchanging Precision (IEP) and Interaction Expressiveness Gap (IEG), designed for the quantitative and objective assessment of agents' interaction competencies. Our findings highlight the utility of these evaluative methods and show significant potential for improving LLMs' ability to construct agents that interact in a more natural manner with human-like intricacy.

Related papers

Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues [16.07828032939124]
This paper presents an evaluation framework for agentic AI systems in mission-critical negotiation contexts.<n>Using Sotopia as a simulation testbed, we present two experiments that systematically evaluated how personality traits and AI agent characteristics influence social negotiation outcomes.
arXiv Detail & Related papers (2025-06-19T00:14:56Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Learning responsibility allocations for multi-agent interactions: A differentiable optimization approach with control barrier functions [12.074590482085831]
We seek to codify factors governing safe multi-agent interactions via the lens of responsibility. We propose a data-driven modeling approach based on control barrier functions and differentiable optimization.
arXiv Detail & Related papers (2024-10-09T20:20:41Z)
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion. In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation. We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development. We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Towards interactive evaluations for interaction harms in human-AI systems [8.989911701384788]
We propose a shift towards evaluation based on textitinteractional ethics, which focuses on textitinteraction harms<n>First, we discuss the limitations of current evaluation methods, which (1) are static, (2) assume a universal user experience, and (3) have limited construct validity.<n>We present practical principles for designing interactive evaluations. These include ecologically valid interaction scenarios, human impact metrics, and diverse human participation approaches.
arXiv Detail & Related papers (2024-05-17T08:49:34Z)
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task. We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z)
Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP. This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z)
Automatic Context-Driven Inference of Engagement in HMI: A Survey [6.479224589451863]
This paper presents a survey on engagement inference for human-machine interaction. It entails interdisciplinary definition, engagement components and factors, publicly available datasets, ground truth assessment, and most commonly used features and methods. It serves as a guide for the development of future human-machine interaction interfaces with reliable context-aware engagement inference capability.
arXiv Detail & Related papers (2022-09-30T10:46:13Z)
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC) IAC models the interaction of agents from perspectives of policy and value function. We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.