Related papers: Whom to Respond To? A Transformer-Based Model for Multi-Party Social Robot Interaction

Whom to Respond To? A Transformer-Based Model for Multi-Party Social Robot Interaction

URL: http://arxiv.org/abs/2507.10960v1
Date: Tue, 15 Jul 2025 03:42:14 GMT
Title: Whom to Respond To? A Transformer-Based Model for Multi-Party Social Robot Interaction
Authors: He Zhu, Ryo Miyoshi, Yuki Okafuji,
Abstract summary: We propose a Transformer-based multi-task learning framework to improve the decision-making process of social robots.<n>We construct a novel multi-party HRI dataset that captures real-world complexities, such as gaze misalignment.<n>Our findings contribute to the development of socially intelligent social robots capable of engaging in natural and context-aware multi-party interactions.
Score: 4.276453870301421
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prior human-robot interaction (HRI) research has primarily focused on single-user interactions, where robots do not need to consider the timing or recipient of their responses. However, in multi-party interactions, such as at malls and hospitals, social robots must understand the context and decide both when and to whom they should respond. In this paper, we propose a Transformer-based multi-task learning framework to improve the decision-making process of social robots, particularly in multi-user environments. Considering the characteristics of HRI, we propose two novel loss functions: one that enforces constraints on active speakers to improve scene modeling, and another that guides response selection towards utterances specifically directed at the robot. Additionally, we construct a novel multi-party HRI dataset that captures real-world complexities, such as gaze misalignment. Experimental results demonstrate that our model achieves state-of-the-art performance in respond decisions, outperforming existing heuristic-based and single-task approaches. Our findings contribute to the development of socially intelligent social robots capable of engaging in natural and context-aware multi-party interactions.

Related papers

Recognizing Actions from Robotic View for Natural Human-Robot Interaction [52.00935005918032]
Natural Human-Robot Interaction (N-HRI) requires robots to recognize human actions at varying distances and states, regardless of whether the robot itself is in motion or stationary.<n>Existing benchmarks for N-HRI fail to address the unique complexities in N-HRI due to limited data, modalities, task categories, and diversity of subjects and environments.<n>We introduce (Action from Robotic View) a large-scale dataset for perception-centric robotic views prevalent in mobile service robots.
arXiv Detail & Related papers (2025-07-30T09:48:34Z)
The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning [49.32390524168273]
Our work aims to advance the social reasoning of embodied artificial intelligence (AI) agents in real-world social interactions.<n>We introduce a large-scale real-world Human Robot Social Interaction (HSRI) dataset to benchmark the capabilities of language models (LMs) and foundational models (FMs)<n>Our dataset consists of 400 real-world human social robot interaction videos and over 10K annotations, detailing the robot's social errors, competencies, rationale, and corrective actions.
arXiv Detail & Related papers (2025-04-07T06:27:02Z)
HARMONIC: Cognitive and Control Collaboration in Human-Robotic Teams [2.6627293764668902]
This paper describes a cognitive-robotic architecture that integrates the OntoAgent cognitive framework with general-purpose robot control systems applied to human-robot teaming (HRT)<n>HarMONIC incorporates metacognition, meaningful natural language communication, and explainability capabilities required for developing mutual trust in HRT.
arXiv Detail & Related papers (2024-09-26T16:48:21Z)
Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction [9.806227900768926]
We propose to model social motion forecasting in a shared human-robot representation space. ECHO operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios. We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2024-02-07T11:37:14Z)
Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans. Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot. The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z)
Proceeding of the 1st Workshop on Social Robots Personalisation At the crossroads between engineering and humanities (CONCATENATE) [37.838596863193565]
This workshop aims to raise an interdisciplinary discussion on personalisation in robotics. It aims at bringing researchers from different fields together to propose guidelines for personalisation.
arXiv Detail & Related papers (2023-07-10T11:11:24Z)
Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration. We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions. The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z)
Human-Robot Collaboration and Machine Learning: A Systematic Review of Recent Research [69.48907856390834]
Human-robot collaboration (HRC) is the approach that explores the interaction between a human and a robot. This paper proposes a thorough literature review of the use of machine learning techniques in the context of HRC.
arXiv Detail & Related papers (2021-10-14T15:14:33Z)
A MultiModal Social Robot Toward Personalized Emotion Interaction [1.2183405753834562]
This study demonstrates a multimodal human-robot interaction (HRI) framework with reinforcement learning to enhance the robotic interaction policy. The goal is to apply this framework in social scenarios that can let the robots generate a more natural and engaging HRI framework.
arXiv Detail & Related papers (2021-10-08T00:35:44Z)
Cognitive architecture aided by working-memory for self-supervised multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions. Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task. One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z)
Controlling the Sense of Agency in Dyadic Robot Interaction: An Active Inference Approach [6.421670116083633]
We examine dyadic imitative interactions of robots using a variational recurrent neural network model. We examined how regulating the complexity term to minimize free energy during training determines the dynamic characteristics of networks.
arXiv Detail & Related papers (2021-03-03T02:38:09Z)
Affect-Driven Modelling of Robot Personality for Collaborative Human-Robot Interactions [16.40684407420441]
Collaborative interactions require social robots to adapt to the dynamics of human affective behaviour. We propose a novel framework for personality-driven behaviour generation in social robots.
arXiv Detail & Related papers (2020-10-14T16:34:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.