Related papers: Headset: Human emotion awareness under partial occlusions multimodal dataset

Headset: Human emotion awareness under partial occlusions multimodal dataset

URL: http://arxiv.org/abs/2402.09107v1
Date: Wed, 14 Feb 2024 11:42:15 GMT
Title: Headset: Human emotion awareness under partial occlusions multimodal dataset
Authors: Fatemeh Ghorbani Lohesara, Davi Rabbouni Freitas, Christine Guillemot, Karen Eguiazarian, Sebastian Knorr
Abstract summary: We present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs) The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video.
Score: 19.57427512904342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simultaneously. Finally, we also provide an evaluation of our dataset employment with regard to the tasks of facial expression classification, HMDs removal, and point cloud reconstruction. The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video. HEADSET and its all associated raw data and license agreement will be publicly available for research purposes.

Related papers

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments [49.45034796115852]
Operating rooms (ORs) are complex, high-stakes environments requiring precise understanding of interactions among medical staff, tools, and equipment. Current datasets fall short in scale, realism and do not capture the nature of OR scenes, limiting multimodal in OR modeling. We introduce MM-OR, a realistic and large-scale multimodal OR dataset, and first dataset to enable multimodal scene graph generation.
arXiv Detail & Related papers (2025-03-04T13:00:52Z)
MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans [4.098892268127572]
We present a novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR) Our primary objective is to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings.
arXiv Detail & Related papers (2024-09-30T21:51:30Z)
Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms [29.577583619354314]
We propose a large-scale, high-definition ($1280 times 800$) human action recognition dataset based on the CeleX-V event camera. To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare.
arXiv Detail & Related papers (2024-08-19T07:52:20Z)
Aria-NeRF: Multimodal Egocentric View Synthesis [17.0554791846124]
We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs) This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, and information from dual-frequency IMU datasets (1kHz and 800Hz) The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in
arXiv Detail & Related papers (2023-11-11T01:56:35Z)
DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering. Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume. We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z)
A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z)
Multiface: A Dataset for Neural Face Rendering [108.44505415073579]
In this work, we present Multiface, a new multi-view, high-resolution human face dataset. We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance. The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence.
arXiv Detail & Related papers (2022-07-22T17:55:39Z)
Multi-sensor large-scale dataset for multi-view 3D reconstruction [63.59401680137808]
We present a new multi-sensor dataset for multi-view 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. We provide around 1.4 million images of 107 different scenes acquired from 100 viewing directions under 14 lighting conditions.
arXiv Detail & Related papers (2022-03-11T17:32:27Z)
EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments [43.05826988957987]
We release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer. We provide speech intelligibility, quality and signal-to-noise ratio improvement results for a baseline method and show improvements across all tested metrics.
arXiv Detail & Related papers (2021-07-09T02:00:47Z)
Unmasking Communication Partners: A Low-Cost AI Solution for Digitally Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD) Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware. We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z)
DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS) The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development. In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.