Towards Inclusive HRI: Using Sim2Real to Address Underrepresentation in
Emotion Expression Recognition
- URL: http://arxiv.org/abs/2208.07472v1
- Date: Mon, 15 Aug 2022 23:37:13 GMT
- Title: Towards Inclusive HRI: Using Sim2Real to Address Underrepresentation in
Emotion Expression Recognition
- Authors: Saba Akhyani, Mehryar Abbasi Boroujeni, Mo Chen, Angelica Lim
- Abstract summary: We aim to build a system that can perceive humans in a more transparent and inclusive manner.
We use a Sim2Real approach in which we use a suite of 3D simulated human models.
By augmenting a small dynamic emotional expression dataset with a synthetic dataset containing 4536 samples, we achieved an improvement in accuracy of 15%.
- Score: 5.819149317261972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots and artificial agents that interact with humans should be able to do
so without bias and inequity, but facial perception systems have notoriously
been found to work more poorly for certain groups of people than others. In our
work, we aim to build a system that can perceive humans in a more transparent
and inclusive manner. Specifically, we focus on dynamic expressions on the
human face, which are difficult to collect for a broad set of people due to
privacy concerns and the fact that faces are inherently identifiable.
Furthermore, datasets collected from the Internet are not necessarily
representative of the general population. We address this problem by offering a
Sim2Real approach in which we use a suite of 3D simulated human models that
enables us to create an auditable synthetic dataset covering 1)
underrepresented facial expressions, outside of the six basic emotions, such as
confusion; 2) ethnic or gender minority groups; and 3) a wide range of viewing
angles that a robot may encounter a human in the real world. By augmenting a
small dynamic emotional expression dataset containing 123 samples with a
synthetic dataset containing 4536 samples, we achieved an improvement in
accuracy of 15% on our own dataset and 11% on an external benchmark dataset,
compared to the performance of the same model architecture without synthetic
training data. We also show that this additional step improves accuracy
specifically for racial minorities when the architecture's feature extraction
weights are trained from scratch.
Related papers
- EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition [18.8101367995391]
EmoNet Face is a comprehensive benchmark suite for developing and evaluating AI systems.<n>A novel 40-category emotion taxonomy captures finer details of human emotional experiences.<n>Three large-scale, AI-generated datasets with explicit, full-face expressions.<n>EmpathicInsight-Face is a model achieving human-expert-level performance on our benchmark.
arXiv Detail & Related papers (2025-05-26T14:19:58Z) - X2C: A Dataset Featuring Nuanced Facial Expressions for Realistic Humanoid Imitation [27.987188226933846]
The ability to imitate realistic facial expressions is essential for humanoid robots engaged in affective human-robot communication.<n>We introduce X2C, a dataset featuring nuanced facial expressions for realistic humanoid imitation.<n>X2CNet, a novel human-to-humanoid facial expression imitation framework, learns the correspondence between nuanced humanoid expressions and their underlying control values from X2C.
arXiv Detail & Related papers (2025-05-16T11:48:19Z) - Multiface: A Dataset for Neural Face Rendering [108.44505415073579]
In this work, we present Multiface, a new multi-view, high-resolution human face dataset.
We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance.
The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence.
arXiv Detail & Related papers (2022-07-22T17:55:39Z) - Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration.
We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions.
The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments.
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames.
Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z) - Facial Emotion Recognition using Deep Residual Networks in Real-World
Environments [5.834678345946704]
We propose a facial feature extractor model trained on an in-the-wild and massively collected video dataset.
The dataset consists of a million labelled frames and 2,616 thousand subjects.
As temporal information is important to the emotion recognition domain, we utilise LSTM cells to capture the temporal dynamics in the data.
arXiv Detail & Related papers (2021-11-04T10:08:22Z) - Fake It Till You Make It: Face analysis in the wild using synthetic data
alone [9.081019005437309]
We show that it is possible to perform face-related computer vision in the wild using synthetic data alone.
We describe how to combine a procedurally-generated 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism.
arXiv Detail & Related papers (2021-09-30T13:07:04Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.