Related papers: User Experience Estimation in Human-Robot Interaction Via Multi-Instance Learning of Multimodal Social Signals

User Experience Estimation in Human-Robot Interaction Via Multi-Instance Learning of Multimodal Social Signals

URL: http://arxiv.org/abs/2507.23544v1
Date: Thu, 31 Jul 2025 13:34:15 GMT
Title: User Experience Estimation in Human-Robot Interaction Via Multi-Instance Learning of Multimodal Social Signals
Authors: Ryo Miyoshi, Yuki Okafuji, Takuya Iwamoto, Junya Nakanishi, Jun Baba,
Abstract summary: This study proposes a UX estimation method for human-robot interaction (HRI) by leveraging multimodal social signals.<n>Unlike conventional models that rely on momentary observations, our approach captures both short- and long-term interaction patterns.<n> Experimental results demonstrate that our method outperforms third-party human evaluators in UX estimation.
Score: 2.7138092972120766
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, the demand for social robots has grown, requiring them to adapt their behaviors based on users' states. Accurately assessing user experience (UX) in human-robot interaction (HRI) is crucial for achieving this adaptability. UX is a multi-faceted measure encompassing aspects such as sentiment and engagement, yet existing methods often focus on these individually. This study proposes a UX estimation method for HRI by leveraging multimodal social signals. We construct a UX dataset and develop a Transformer-based model that utilizes facial expressions and voice for estimation. Unlike conventional models that rely on momentary observations, our approach captures both short- and long-term interaction patterns using a multi-instance learning framework. This enables the model to capture temporal dynamics in UX, providing a more holistic representation. Experimental results demonstrate that our method outperforms third-party human evaluators in UX estimation.

Related papers

MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations [19.184155232662995]
We propose a novel approach for learning a shared latent space representation for Human-Robot Interaction (HRI) We train a Variational Autoencoder (VAE) to learn robot motions regularized using an informative latent space prior. We find that our approach of using an informative MDN prior from human observations for a VAE generates more accurate robot motions.
arXiv Detail & Related papers (2024-07-10T13:16:12Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts [47.12378253630105]
We study trajectory-level data generation for multi-human or human-robot interaction scenarios. We propose a learning-based automatic trajectory generation model, which we call Multi-Agent TRajectory generation with dIverse conteXts (MATRIX)
arXiv Detail & Related papers (2024-03-09T23:28:54Z)
RealDex: Towards Human-like Grasping for Robotic Dexterous Hand [64.33746404551343]
We introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns.<n>RealDex holds immense promise in advancing humanoid robot for automated perception, cognition, and manipulation in real-world scenarios.
arXiv Detail & Related papers (2024-02-21T14:59:46Z)
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection. We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z)
Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans. Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot. The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z)
Versatile User Identification in Extended Reality using Pretrained Similarity-Learning [16.356961801884562]
We develop a similarity-learning model and pretrained it on the "Who Is Alyx?" dataset. In comparison with a traditional classification-learning baseline, our model shows superior performance. Our approach paves the way for easy integration of pretrained motion-based identification models in production XR systems.
arXiv Detail & Related papers (2023-02-15T08:26:24Z)
Predicting the long-term collective behaviour of fish pairs with deep learning [52.83927369492564]
This study introduces a deep learning model to assess social interactions in the fish species Hemigrammus rhodostomus. We compare the results of our deep learning approach to experiments and to the results of a state-of-the-art analytical model. We demonstrate that machine learning models social interactions can directly compete with their analytical counterparts in subtle experimental observables.
arXiv Detail & Related papers (2023-02-14T05:25:03Z)
Multi-Timescale Modeling of Human Behavior [0.18199355648379031]
We propose an LSTM network architecture that processes behavioral information at multiple timescales to predict future behavior. We evaluate our architecture on data collected in an urban search and rescue scenario simulated in a virtual Minecraft-based testbed.
arXiv Detail & Related papers (2022-11-16T15:58:57Z)
Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios [47.99589136455976]
We present the first systematic comparison of state-of-the-art approaches for behavior forecasting. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future, we outperform the baselines even for a considerably longer-term future.
arXiv Detail & Related papers (2022-03-07T09:59:30Z)
Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions. We propose two knowledge-based data-driven methods to effectively capture these social interactions. We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.