To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills
- URL: http://arxiv.org/abs/2308.10757v2
- Date: Thu, 28 Mar 2024 08:26:50 GMT
- Title: To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills
- Authors: Carlo Mazzola, Marta Romeo, Francesco Rea, Alessandra Sciutti, Angelo Cangelosi,
- Abstract summary: We tackle the problem of Addressee Estimation, the ability to understand an utterance's addressee, by interpreting and exploiting non-verbal bodily cues from the speaker.
We do so by implementing an hybrid deep learning model composed of convolutional layers and LSTM cells taking as input images portraying the face of the speaker and 2D vectors of the speaker's body posture.
We demonstrate that our model is able to solve the Addressee Estimation problem in terms of addressee localisation in space, from a robot ego-centric point of view.
- Score: 44.497086629717074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Communicating shapes our social word. For a robot to be considered social and being consequently integrated in our social environment it is fundamental to understand some of the dynamics that rule human-human communication. In this work, we tackle the problem of Addressee Estimation, the ability to understand an utterance's addressee, by interpreting and exploiting non-verbal bodily cues from the speaker. We do so by implementing an hybrid deep learning model composed of convolutional layers and LSTM cells taking as input images portraying the face of the speaker and 2D vectors of the speaker's body posture. Our implementation choices were guided by the aim to develop a model that could be deployed on social robots and be efficient in ecological scenarios. We demonstrate that our model is able to solve the Addressee Estimation problem in terms of addressee localisation in space, from a robot ego-centric point of view.
Related papers
- Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task [17.190635800969456]
In this paper, we examine using Large Language Models to infer human intention in a collaborative object categorization task with a physical robot.
We propose a novel multimodal approach that integrates user non-verbal cues, like hand gestures, body poses, and facial expressions, with environment states and user verbal cues to predict user intentions.
arXiv Detail & Related papers (2024-04-12T12:15:14Z) - Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction [9.806227900768926]
We propose to model social motion forecasting in a shared human-robot representation space.
ECHO operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios.
We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2024-02-07T11:37:14Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - A Human-Robot Mutual Learning System with Affect-Grounded Language
Acquisition and Differential Outcomes Training [0.1812164955222814]
The paper presents a novel human-robot interaction setup for identifying robot homeostatic needs.
We adopted a differential outcomes training protocol whereby the robot provides feedback specific to its internal needs.
We found evidence that DOT can enhance the human's learning efficiency, which in turn enables more efficient robot language acquisition.
arXiv Detail & Related papers (2023-10-20T09:41:31Z) - HandMeThat: Human-Robot Communication in Physical and Social
Environments [73.91355172754717]
HandMeThat is a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments.
HandMeThat contains 10,000 episodes of human-robot interactions.
We show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat.
arXiv Detail & Related papers (2023-10-05T16:14:46Z) - Developing Social Robots with Empathetic Non-Verbal Cues Using Large
Language Models [2.5489046505746704]
We design and label four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot.
Preliminary results show distinct patterns in the robot's responses, such as a preference for calm and positive social emotions like 'joy' and 'lively', and frequent nodding gestures.
Our work lays the groundwork for future studies on human-robot interactions, emphasizing the essential role of both verbal and non-verbal cues in creating social and empathetic robots.
arXiv Detail & Related papers (2023-08-31T08:20:04Z) - SACSoN: Scalable Autonomous Control for Social Navigation [62.59274275261392]
We develop methods for training policies for socially unobtrusive navigation.
By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space.
We collect a large dataset where an indoor mobile robot interacts with human bystanders.
arXiv Detail & Related papers (2023-06-02T19:07:52Z) - Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration.
We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions.
The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z) - Self-supervised reinforcement learning for speaker localisation with the
iCub humanoid robot [58.2026611111328]
Looking at a person's face is one of the mechanisms that humans rely on when it comes to filtering speech in noisy environments.
Having a robot that can look toward a speaker could benefit ASR performance in challenging environments.
We propose a self-supervised reinforcement learning-based framework inspired by the early development of humans.
arXiv Detail & Related papers (2020-11-12T18:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.