Unimodal and Multimodal Static Facial Expression Recognition for Virtual Reality Users with EmoHeVRDB
- URL: http://arxiv.org/abs/2412.11306v1
- Date: Sun, 15 Dec 2024 20:49:46 GMT
- Title: Unimodal and Multimodal Static Facial Expression Recognition for Virtual Reality Users with EmoHeVRDB
- Authors: Thorben Ortmann, Qi Wang, Larissa Putzar,
- Abstract summary: We explored the potential of utilizing Facial Expression Activations (FEAs) captured via the Meta Quest Pro Virtual Reality (VR) headset for Facial Expression Recognition (FER) in VR settings.
We compared several unimodal approaches and achieved up to 73.02% accuracy for the static FER task with seven emotion categories.
We integrated FEA and image data in multimodal approaches, observing significant improvements in recognition accuracy.
- Score: 4.095418032380801
- License:
- Abstract: In this study, we explored the potential of utilizing Facial Expression Activations (FEAs) captured via the Meta Quest Pro Virtual Reality (VR) headset for Facial Expression Recognition (FER) in VR settings. Leveraging the EmojiHeroVR Database (EmoHeVRDB), we compared several unimodal approaches and achieved up to 73.02% accuracy for the static FER task with seven emotion categories. Furthermore, we integrated FEA and image data in multimodal approaches, observing significant improvements in recognition accuracy. An intermediate fusion approach achieved the highest accuracy of 80.42%, significantly surpassing the baseline evaluation result of 69.84% reported for EmoHeVRDB's image data. Our study is the first to utilize EmoHeVRDB's unique FEA data for unimodal and multimodal static FER, establishing new benchmarks for FER in VR settings. Our findings highlight the potential of fusing complementary modalities to enhance FER accuracy in VR settings, where conventional image-based methods are severely limited by the occlusion caused by Head-Mounted Displays (HMDs).
Related papers
- BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI [11.569448567735435]
BrainMVP is a multi-modal vision pre-training framework for brain image analysis using multi-parametric MRI scans.
Cross-modal reconstruction is explored to learn distinctive brain image embeddings and efficient modality fusion capabilities.
Experiments on downstream tasks demonstrate superior performance compared to state-of-the-art pre-training methods in the medical domain.
arXiv Detail & Related papers (2024-10-14T15:12:16Z) - EmojiHeroVR: A Study on Facial Expression Recognition under Partial Occlusion from Head-Mounted Displays [4.095418032380801]
EmoHeVRDB (EmojiHeroVR Database) includes 3,556 labeled facial images of 1,778 reenacted emotions.
EmojiHeVRDB includes data on the activations of 63 facial expressions captured via the Meta Quest Pro VR headset.
Best model achieved an accuracy of 69.84% on the test set.
arXiv Detail & Related papers (2024-10-04T11:29:04Z) - EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs [17.864281586189392]
Egocentric human pose estimation (HPE) using wearable sensors is essential for VR/AR applications.
Most methods rely solely on either egocentric-view images or sparse Inertial Measurement Unit (IMU) signals.
We propose EMHI, a multimodal textbfEgocentric human textbfMotion dataset with textbfHead-Mounted Display (HMD) and body-worn textbfIMUs.
arXiv Detail & Related papers (2024-08-30T10:12:13Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Thelxinoƫ: Recognizing Human Emotions Using Pupillometry and Machine Learning [0.0]
This research contributes significantly to the Thelxino"e framework, aiming to enhance VR experiences by integrating multiple sensor data for realistic and emotionally resonant touch interactions.
Our findings open new avenues for developing more immersive and interactive VR environments, paving the way for future advancements in virtual touch technology.
arXiv Detail & Related papers (2024-03-27T21:14:17Z) - Deep Motion Masking for Secure, Usable, and Scalable Real-Time Anonymization of Virtual Reality Motion Data [49.68609500290361]
Recent studies have demonstrated that the motion tracking "telemetry" data used by nearly all VR applications is as uniquely identifiable as a fingerprint scan.
We present in this paper a state-of-the-art VR identification model that can convincingly bypass known defensive countermeasures.
arXiv Detail & Related papers (2023-11-09T01:34:22Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Learning Diversified Feature Representations for Facial Expression
Recognition in the Wild [97.14064057840089]
We propose a mechanism to diversify the features extracted by CNN layers of state-of-the-art facial expression recognition architectures.
Experimental results on three well-known facial expression recognition in-the-wild datasets, AffectNet, FER+, and RAF-DB, show the effectiveness of our method.
arXiv Detail & Related papers (2022-10-17T19:25:28Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge
Distillation [86.41437210485932]
We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously.
We propose a novel end-to-end zero-shot HOI Detection framework via vision-language knowledge distillation.
Our method outperforms the previous SOTA by 8.92% on unseen mAP and 10.18% on overall mAP.
arXiv Detail & Related papers (2022-04-01T07:27:19Z) - Facial Expression Recognition Under Partial Occlusion from Virtual
Reality Headsets based on Transfer Learning [0.0]
convolutional neural network based approaches has become widely adopted due to their proven applicability to Facial Expression Recognition task.
However, recognizing facial expression while wearing a head-mounted VR headset is a challenging task due to the upper half of the face being completely occluded.
We propose a geometric model to simulate occlusion resulting from a Samsung Gear VR headset that can be applied to existing FER datasets.
arXiv Detail & Related papers (2020-08-12T20:25:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.