Related papers: The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning

The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning

URL: http://arxiv.org/abs/2504.13898v1
Date: Mon, 07 Apr 2025 06:27:02 GMT
Title: The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning
Authors: Dong Won Lee, Yubin Kim, Denison Guvenoz, Sooyeon Jeong, Parker Malachowsky, Louis-Philippe Morency, Cynthia Breazeal, Hae Won Park,
Abstract summary: Our work aims to advance the social reasoning of embodied artificial intelligence (AI) agents in real-world social interactions.<n>We introduce a large-scale real-world Human Robot Social Interaction (HSRI) dataset to benchmark the capabilities of language models (LMs) and foundational models (FMs)<n>Our dataset consists of 400 real-world human social robot interaction videos and over 10K annotations, detailing the robot's social errors, competencies, rationale, and corrective actions.
Score: 49.32390524168273
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Our work aims to advance the social reasoning of embodied artificial intelligence (AI) agents in real-world social interactions. Recently, language models (LMs) and foundational models (FMs) are being utilized as automatic evaluators of human-AI interactions with the goal of eventually being used to improve the policy of the AI agent. To enable further research in this direction, we introduce a large-scale real-world Human Robot Social Interaction (HSRI) Dataset to benchmark the capabilities of LMs and FMs to identify and reason about social interactions, specifically with regard to robot social errors and competencies . Our dataset consists of 400 real-world human social robot interaction videos and over 10K annotations, detailing the robot's social errors, competencies, rationale, and corrective actions, capturing unique aspects of human-AI interaction only present in real-world interactions. To further assess AI models' ability to reason about social interactions, we propose eight new benchmark tasks for evaluating centered around whether AI models can (1) evaluate social interactions via detecting social errors and competencies, (2) identify the explanatory factors associated to errors and competencies, (3) understand the flow of real-world social interactions, and (4) provide reasons and corrective actions for social errors. Human studies and experiments with modern LMs and FMs reveal that current models struggle with these tasks, demonstrating that our dataset and benchmark provides a step forward towards socially intelligent AI.

Related papers

Towards interactive evaluations for interaction harms in human-AI systems [8.989911701384788]
We argue for a paradigm shift toward evaluation centered on. textitinteractional ethics. We propose principles for evaluating generative models through interaction scenarios and human impact metrics.
arXiv Detail & Related papers (2024-05-17T08:49:34Z)
Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions [67.60397632819202]
Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal. We identify a set of underlying technical challenges and open questions for researchers across computing communities to advance Social-AI.
arXiv Detail & Related papers (2024-04-17T02:57:42Z)
Socially Cognizant Robotics for a Technology Enhanced Society [13.094097428580564]
We advocate an interdisciplinary approach, socially cognizant robotics, which synthesizes technical and social science methods. We argue that this approach follows from the need to empower stakeholder participation in shaping AI-driven robot behavior. We develop best practices for socially cognizant robot design that balance traditional technology-based metrics with critically important, albeit challenging, metrics.
arXiv Detail & Related papers (2023-10-27T17:53:02Z)
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
Towards socially-competent and culturally-adaptive artificial agents Expressive order, interactional disruptions and recovery strategies [0.0]
The overarching aim of this work is to set a framework to make the artificial agent socially-competent beyond dyadic interaction-interaction. The paper highlights how this level of competence is achieved by focusing on just three dimensions: (i) social capability, (ii) relational role, and (iii) proximity.
arXiv Detail & Related papers (2023-08-06T15:47:56Z)
Training Socially Aligned Language Models on Simulated Social Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values. Current language models (LMs) are trained to rigidly replicate their training corpus in isolation. This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z)
CASPER: Cognitive Architecture for Social Perception and Engagement in Robots [0.5918643136095765]
We present CASPER: a symbolic cognitive architecture that uses qualitative spatial reasoning to anticipate the pursued goal of another agent and to calculate the best collaborative behavior. We have tested this architecture in a simulated kitchen environment and the results we have collected show that the robot is able to both recognize an ongoing goal and to properly collaborate towards its achievement.
arXiv Detail & Related papers (2022-09-01T10:15:03Z)
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception [50.551003004553806]
We create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions. Phase is validated with human experiments demonstrating that humans perceive rich interactions in the social events. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE, which outperforms state-of-the-art feed-forward neural networks.
arXiv Detail & Related papers (2021-03-02T18:44:57Z)
Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration [116.28433607265573]
We introduce Watch-And-Help (WAH), a challenge for testing social intelligence in AI agents. In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently. We build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines.
arXiv Detail & Related papers (2020-10-19T21:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.