MVRS: The Multimodal Virtual Reality Stimuli-based Emotion Recognition Dataset
- URL: http://arxiv.org/abs/2509.05330v1
- Date: Sun, 31 Aug 2025 17:20:53 GMT
- Title: MVRS: The Multimodal Virtual Reality Stimuli-based Emotion Recognition Dataset
- Authors: Seyed Muhammad Hossein Mousavi, Atiye Ilanloo,
- Abstract summary: The MVRS dataset features synchronized recordings from 13 participants aged 12 to 60 exposed to VR based emotional stimuli.<n>Data were collected using eye tracking (via webcam in a VR headset), body motion (Kinect v2), and EMG and GSR signals (Arduino UNO)<n>Features from each modality were extracted, fused using early and late fusion techniques, and evaluated with timestamps to confirm the datasets quality and emotion separability.
- Score: 0.2578242050187029
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic emotion recognition has become increasingly important with the rise of AI, especially in fields like healthcare, education, and automotive systems. However, there is a lack of multimodal datasets, particularly involving body motion and physiological signals, which limits progress in the field. To address this, the MVRS dataset is introduced, featuring synchronized recordings from 13 participants aged 12 to 60 exposed to VR based emotional stimuli (relaxation, fear, stress, sadness, joy). Data were collected using eye tracking (via webcam in a VR headset), body motion (Kinect v2), and EMG and GSR signals (Arduino UNO), all timestamp aligned. Participants followed a unified protocol with consent and questionnaires. Features from each modality were extracted, fused using early and late fusion techniques, and evaluated with classifiers to confirm the datasets quality and emotion separability, making MVRS a valuable contribution to multimodal affective computing.
Related papers
- Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding [45.13650362585136]
We present Emotion-LLaMAv2 and the MMEVerse benchmark, establishing an end-to-end pipeline together with a standardized evaluation setting for emotion recognition and reasoning.<n>An end-to-end multiview encoder eliminates external face detection and captures nuanced emotional cues via richer spatial and temporal multiview tokens.<n>A perception-to-cognition curriculum instruction tuning scheme within the LLaMA2 backbone unifies emotion recognition and free-form emotion reasoning.
arXiv Detail & Related papers (2026-01-23T05:02:43Z) - REFA: Real-time Egocentric Facial Animations for Virtual Reality [56.82169742343143]
We present a novel system for real-time tracking of facial expressions using egocentric views captured from a set of infrared cameras embedded in a virtual reality (VR) headset.<n>Our technology facilitates any user to accurately drive the facial expressions of virtual characters in a non-intrusive manner.
arXiv Detail & Related papers (2026-01-07T01:41:46Z) - CAST-Phys: Contactless Affective States Through Physiological signals Database [74.28082880875368]
The lack of affective multi-modal datasets remains a major bottleneck in developing accurate emotion recognition systems.<n>We present the Contactless Affective States Through Physiological Signals Database (CAST-Phys), a novel high-quality dataset capable of remote physiological emotion recognition.<n>Our analysis highlights the crucial role of physiological signals in realistic scenarios where facial expressions alone may not provide sufficient emotional information.
arXiv Detail & Related papers (2025-07-08T15:20:24Z) - Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input [62.51283548975632]
This work focuses on tracking and understanding human motion using consumer wearable devices, such as VR/AR headsets, smart glasses, cellphones, and smartwatches.<n>We present Ego4o (o for omni), a new framework for simultaneous human motion capture and understanding from multi-modal egocentric inputs.
arXiv Detail & Related papers (2025-04-11T11:18:57Z) - VR Based Emotion Recognition Using Deep Multimodal Fusion With Biosignals Across Multiple Anatomical Domains [3.303674512749726]
We introduce a novel multi-scale attention-based LSTM architecture, combined with Squeeze-and-Excitation (SE) blocks.<n>The proposed architecture, validated in a user study, demonstrates superior performance in classifying valance and arousal level.
arXiv Detail & Related papers (2024-12-03T08:59:12Z) - Emotion Recognition from the perspective of Activity Recognition [0.0]
Appraising human emotional states, behaviors, and reactions displayed in real-world settings can be accomplished using latent continuous dimensions.
For emotion recognition systems to be deployed and integrated into real-world mobile and computing devices, we need to consider data collected in the world.
We propose a novel three-stream end-to-end deep learning regression pipeline with an attention mechanism.
arXiv Detail & Related papers (2024-03-24T18:53:57Z) - Headset: Human emotion awareness under partial occlusions multimodal
dataset [19.57427512904342]
We present a new multimodal database to help advance the development of immersive technologies.
Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs)
The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video.
arXiv Detail & Related papers (2024-02-14T11:42:15Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - IMUTube: Automatic Extraction of Virtual on-body Accelerometry from
Video for Human Activity Recognition [12.91206329972949]
We introduce IMUTube, an automated processing pipeline to convert videos of human activity into virtual streams of IMU data.
These virtual IMU streams represent accelerometry at a wide variety of locations on the human body.
We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets.
arXiv Detail & Related papers (2020-05-29T21:50:38Z) - K-EmoCon, a multimodal sensor dataset for continuous emotion recognition
in naturalistic conversations [19.350031493515562]
K-EmoCon is a novel dataset with comprehensive annotations of continuous emotions during naturalistic conversations.
The dataset contains multimodal measurements, including audiovisual recordings, EEG, and peripheral physiological signals.
It includes emotion annotations from all three available perspectives: self, debate partner, and external observers.
arXiv Detail & Related papers (2020-05-08T15:51:12Z) - An End-to-End Visual-Audio Attention Network for Emotion Recognition in
User-Generated Videos [64.91614454412257]
We propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs)
Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN.
arXiv Detail & Related papers (2020-02-12T15:33:59Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.