Headset: Human emotion awareness under partial occlusions multimodal
dataset
- URL: http://arxiv.org/abs/2402.09107v1
- Date: Wed, 14 Feb 2024 11:42:15 GMT
- Title: Headset: Human emotion awareness under partial occlusions multimodal
dataset
- Authors: Fatemeh Ghorbani Lohesara, Davi Rabbouni Freitas, Christine Guillemot,
Karen Eguiazarian, Sebastian Knorr
- Abstract summary: We present a new multimodal database to help advance the development of immersive technologies.
Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs)
The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video.
- Score: 19.57427512904342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The volumetric representation of human interactions is one of the fundamental
domains in the development of immersive media productions and telecommunication
applications. Particularly in the context of the rapid advancement of Extended
Reality (XR) applications, this volumetric data has proven to be an essential
technology for future XR elaboration. In this work, we present a new multimodal
database to help advance the development of immersive technologies. Our
proposed database provides ethically compliant and diverse volumetric data, in
particular 27 participants displaying posed facial expressions and subtle body
movements while speaking, plus 11 participants wearing head-mounted displays
(HMDs). The recording system consists of a volumetric capture (VoCap) studio,
including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In
addition to textured meshes, point clouds, and multi-view RGB-D data, we use
one Lytro Illum camera for providing light field (LF) data simultaneously.
Finally, we also provide an evaluation of our dataset employment with regard to
the tasks of facial expression classification, HMDs removal, and point cloud
reconstruction. The dataset can be helpful in the evaluation and performance
testing of various XR algorithms, including but not limited to facial
expression recognition and reconstruction, facial reenactment, and volumetric
video. HEADSET and its all associated raw data and license agreement will be
publicly available for research purposes.
Related papers
- Aria-NeRF: Multimodal Egocentric View Synthesis [17.0554791846124]
We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs)
This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, and information from dual-frequency IMU datasets (1kHz and 800Hz)
The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in
arXiv Detail & Related papers (2023-11-11T01:56:35Z) - Multisensory extended reality applications offer benefits for volumetric biomedical image analysis in research and medicine [2.46537907738351]
3D data from high-resolution volumetric imaging is a central resource for diagnosis and treatment in modern medicine.
Recent research used extended reality (XR) for perceiving 3D images with visual depth perception and touch but used restrictive haptic devices.
In this study, 24 experts for biomedical images in research and medicine explored 3D medical shapes with 3 applications.
arXiv Detail & Related papers (2023-11-07T13:37:47Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D,
and Inertial Sensors [6.955796938573367]
We present mRI, a multi-modal 3D human pose estimation dataset with mmWave, RGB-D, and Inertial Sensors.
Our dataset consists of over 160k synchronized frames from 20 subjects performing rehabilitation exercises.
arXiv Detail & Related papers (2022-10-15T23:08:44Z) - Multiface: A Dataset for Neural Face Rendering [108.44505415073579]
In this work, we present Multiface, a new multi-view, high-resolution human face dataset.
We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance.
The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence.
arXiv Detail & Related papers (2022-07-22T17:55:39Z) - Multi-sensor large-scale dataset for multi-view 3D reconstruction [63.59401680137808]
We present a new multi-sensor dataset for multi-view 3D surface reconstruction.
It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner.
We provide around 1.4 million images of 107 different scenes acquired from 100 viewing directions under 14 lighting conditions.
arXiv Detail & Related papers (2022-03-11T17:32:27Z) - EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy
Communication in Noisy Environments [43.05826988957987]
We release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer.
We provide speech intelligibility, quality and signal-to-noise ratio improvement results for a baseline method and show improvements across all tested metrics.
arXiv Detail & Related papers (2021-07-09T02:00:47Z) - Unmasking Communication Partners: A Low-Cost AI Solution for Digitally
Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD)
Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware.
We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z) - DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention
and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS)
The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development.
In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.