Related papers: Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots

Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots

URL: http://arxiv.org/abs/2412.03300v2
Date: Tue, 12 Aug 2025 17:48:58 GMT
Title: Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots
Authors: Qiaoqiao Ren, Remko Proesmans, Yuanbo Hou, Francis wyffels, Tony Belpaeme,
Abstract summary: Human emotions are complex and can be conveyed through nuanced touch gestures.<n>Previous research has primarily focused on how humans recognize emotions through touch or on identifying key features of emotional expression for robots.<n>This study investigates the consistency and distinguishability of emotional and gestural expressions through touch and sound.
Score: 4.072544789256895
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Human emotions are complex and can be conveyed through nuanced touch gestures. Previous research has primarily focused on how humans recognize emotions through touch or on identifying key features of emotional expression for robots. However, there is a gap in understanding how reliably these emotions and gestures can be communicated to robots via touch and interpreted using data driven methods. This study investigates the consistency and distinguishability of emotional and gestural expressions through touch and sound. To this end, we integrated a custom piezoresistive pressure sensor as well as a microphone on a social robot. Twenty-eight participants first conveyed ten different emotions to the robot using spontaneous touch gestures, then they performed six predefined social touch gestures. Our findings reveal statistically significant consistency in both emotion and gesture expression among participants. However, some emotions exhibited low intraclass correlation values, and certain emotions with similar levels of arousal or valence did not show significant differences in their conveyance. To investigate emotion and social gesture decoding within affective human-robot tactile interaction, we developed single-modality models and multimodal models integrating tactile and auditory features. A support vector machine (SVM) model trained on multimodal features achieved the highest accuracy for classifying ten emotions, reaching 40 %.For gesture classification, a Convolutional Neural Network- Long Short-Term Memory Network (CNN-LSTM) achieved 90.74 % accuracy. Our results demonstrate that even though the unimodal models have the potential to decode emotions and touch gestures, the multimodal integration of touch and sound significantly outperforms unimodal approaches, enhancing the decoding of both emotions and gestures.

Related papers

Modelling the Interplay of Eye-Tracking Temporal Dynamics and Personality for Emotion Detection in Face-to-Face Settings [1.2600839346487007]
This work presents a personality-aware multimodal framework that integrates eye-tracking sequences, Big Five personality traits, and contextual stimulus cues to predict both perceived and felt emotions.<n>Results show that stimulus cues strongly enhance perceived-emotion predictions, while personality traits provide the largest improvements for felt emotion recognition.
arXiv Detail & Related papers (2025-09-19T16:05:23Z)
Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction [83.88591755871734]
EmoRAG is a system designed to detect perceived emotions in text for SemEval-2025 Task 11, Subtask A: Multi-label Emotion Detection.<n>We focus on predicting the perceived emotions of the speaker from a given text snippet, labeling it with emotions such as joy, sadness, fear, anger, surprise, and disgust.
arXiv Detail & Related papers (2025-06-04T19:41:24Z)
UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech [34.89118596727314]
We propose UDDETTS, a neural language model unifying discrete and dimensional emotions for controllable emotional TTS.<n>This model introduces the interpretable Arousal-Dominance-Valence (ADV) space for dimensional emotion description and supports emotion control driven by either discrete emotion labels or nonlinearly quantified ADV values.<n>Experiments show that UDDETTS unifies linear emotion control along the three dimensions of ADV space, and exhibits superior end-to-end emotional speech synthesis capabilities.
arXiv Detail & Related papers (2025-05-15T12:57:19Z)
Digitizing Touch with an Artificial Multimodal Fingertip [51.7029315337739]
Humans and robots both benefit from using touch to perceive and interact with the surrounding environment. Here, we describe several conceptual and technological innovations to improve the digitization of touch. These advances are embodied in an artificial finger-shaped sensor with advanced sensing capabilities.
arXiv Detail & Related papers (2024-11-04T18:38:50Z)
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech [34.03787613163788]
EmoSphere-TTS synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. We propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics.
arXiv Detail & Related papers (2024-06-12T01:40:29Z)
Exploring Emotions in Multi-componential Space using Interactive VR Games [1.1510009152620668]
We operationalised a data-driven approach using interactive Virtual Reality (VR) games. We used Machine Learning (ML) methods to identify the unique contributions of each component to emotion differentiation. These findings also have implications for using VR environments in emotion research.
arXiv Detail & Related papers (2024-04-04T06:54:44Z)
Self context-aware emotion perception on human-robot interaction [3.775456992482295]
Humans consider that contextual information and different contexts can lead to completely different emotional expressions. We introduce self context-aware model (SCAM) that employs a two-dimensional emotion coordinate system for anchoring and re-labeling distinct emotions. This approach has yielded significant improvements across audio, video, and multimodal environments.
arXiv Detail & Related papers (2024-01-18T10:58:27Z)
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation [42.29118614670941]
We propose emotion2vec, a universal speech emotion representation model. emotion2vec is pre-trained on unlabeled emotion data through self-supervised online distillation. It outperforms state-of-the-art pre-trained universal models and emotion specialist models.
arXiv Detail & Related papers (2023-12-23T07:46:55Z)
Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion [87.18073195745914]
We investigate how well human-annotated emotion triggers correlate with features deemed salient in their prediction of emotions. Using EmoTrigger, we evaluate the ability of large language models to identify emotion triggers. Our analysis reveals that emotion triggers are largely not considered salient features for emotion prediction models, instead there is intricate interplay between various features and the task of emotion detection.
arXiv Detail & Related papers (2023-11-16T06:20:13Z)
WEARS: Wearable Emotion AI with Real-time Sensor data [0.8740570557632509]
We propose a system to predict user emotion using smartwatch sensors. We design a framework to collect ground truth in real-time utilizing a mix of English and regional language-based videos. We also did an ablation study to understand the impact of features including Heart Rate, Accelerometer, and Gyroscope sensor data on mood.
arXiv Detail & Related papers (2023-08-22T11:03:00Z)
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation [49.925499720323806]
We study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks. We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor.
arXiv Detail & Related papers (2022-12-07T18:55:53Z)
Speech Synthesis with Mixed Emotions [77.05097999561298]
We propose a novel formulation that measures the relative difference between the speech samples of different emotions. We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework. At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector.
arXiv Detail & Related papers (2022-08-11T15:45:58Z)
Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration. We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions. The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z)
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities. We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z)
Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In this paper, we aim to explicitly characterize and control the intensity of emotion. We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z)
Multi-Cue Adaptive Emotion Recognition Network [4.570705738465714]
We propose a new deep learning approach for emotion recognition based on adaptive multi-cues. We compare the proposed approach with the state-of-art approaches in the CAER-S dataset.
arXiv Detail & Related papers (2021-11-03T15:08:55Z)
Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER) We begin with a brief introduction on widely used emotion representation models and affective modalities. We then summarize existing emotion annotation strategies and corresponding computational tasks. Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z)
A Circular-Structured Representation for Visual Emotion Distribution Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning. To be specific, we first construct an Emotion Circle to unify any emotional state within it. On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z)
Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model. Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z)
Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network for Emotional Conversation Generation [25.808037796936766]
In a real-world conversation, we instinctively perceive emotions from multi-source information. We propose a heterogeneous graph-based model for emotional conversation generation. Experimental results show that our model can effectively perceive emotions from multi-source knowledge.
arXiv Detail & Related papers (2020-12-09T06:09:31Z)
Emotion Recognition From Gait Analyses: Current Research and Future Directions [48.93172413752614]
gait conveys information about the walker's emotion. The mapping between various emotions and gait patterns provides a new source for automated emotion recognition. gait is remotely observable, more difficult to imitate, and requires less cooperation from the subject.
arXiv Detail & Related papers (2020-03-13T08:22:33Z)
ProxEmo: Gait-based Emotion Learning and Multi-view Proxemic Fusion for Socially-Aware Robot Navigation [65.11858854040543]
We present ProxEmo, a novel end-to-end emotion prediction algorithm for robot navigation among pedestrians. Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation.
arXiv Detail & Related papers (2020-03-02T17:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.