Full Body Video-Based Self-Avatars for Mixed Reality: from E2E System to
User Study
- URL: http://arxiv.org/abs/2208.12639v1
- Date: Wed, 24 Aug 2022 20:59:17 GMT
- Title: Full Body Video-Based Self-Avatars for Mixed Reality: from E2E System to
User Study
- Authors: Diego Gonzalez Morin, Ester Gonzalez-Sosa, Pablo Perez, and Alvaro
Villegas
- Abstract summary: This work explores the creation of self-avatars through video pass-through in Mixed Reality (MR) applications.
We present our end-to-end system, including: custom MR video pass-through implementation on a commercial head mounted display (HMD)
To validate this technology, we designed an immersive VR experience where the user has to walk through a narrow tiles path over an active volcano crater.
- Score: 1.0149624140985476
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work we explore the creation of self-avatars through video
pass-through in Mixed Reality (MR) applications. We present our end-to-end
system, including: custom MR video pass-through implementation on a commercial
head mounted display (HMD), our deep learning-based real-time egocentric body
segmentation algorithm, and our optimized offloading architecture, to
communicate the segmentation server with the HMD. To validate this technology,
we designed an immersive VR experience where the user has to walk through a
narrow tiles path over an active volcano crater. The study was performed under
three body representation conditions: virtual hands, video pass-through with
color-based full-body segmentation and video pass-through with deep learning
full-body segmentation. This immersive experience was carried out by 30 women
and 28 men. To the best of our knowledge, this is the first user study focused
on evaluating video-based self-avatars to represent the user in a MR scene.
Results showed no significant differences between the different body
representations in terms of presence, with moderate improvements in some
Embodiment components between the virtual hands and full-body representations.
Visual Quality results showed better results from the deep-learning algorithms
in terms of the whole body perception and overall segmentation quality. We
provide some discussion regarding the use of video-based self-avatars, and some
reflections on the evaluation methodology. The proposed E2E solution is in the
boundary of the state of the art, so there is still room for improvement before
it reaches maturity. However, this solution serves as a crucial starting point
for novel MR distributed solutions.
Related papers
- EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z) - Learning High-Quality Navigation and Zooming on Omnidirectional Images in Virtual Reality [37.564863636844905]
We present a novel system, called OmniVR, designed to enhance visual clarity during VR navigation.
Our system enables users to effortlessly locate and zoom in on the objects of interest in VR.
arXiv Detail & Related papers (2024-05-01T07:08:24Z) - POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object
Interaction in the Multi-View World [59.545114016224254]
Humans are good at translating third-person observations of hand-object interactions into an egocentric view.
We propose a Prompt-Oriented View-agnostic learning framework, which enables this view adaptation with few egocentric videos.
arXiv Detail & Related papers (2024-03-09T09:54:44Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - Early Action Recognition with Action Prototypes [62.826125870298306]
We propose a novel model that learns a prototypical representation of the full action for each class.
We decompose the video into short clips, where a visual encoder extracts features from each clip independently.
Later, a decoder aggregates together in an online fashion features from all the clips for the final class prediction.
arXiv Detail & Related papers (2023-12-11T18:31:13Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - Real Time Egocentric Segmentation for Video-self Avatar in Mixed Reality [0.946046736912201]
Our algorithm achieves a frame rate of 66 fps for an input resolution of 640x480, thanks to our shallow network inspired in Thundernet's architecture.
We describe the creation process of our Egocentric Bodies dataset, composed of almost 10,000 images from three datasets.
arXiv Detail & Related papers (2022-07-04T10:00:16Z) - Stereo Video Reconstruction Without Explicit Depth Maps for Endoscopic
Surgery [37.531587409884914]
We introduce the task of stereo video reconstruction or, equivalently, 2D-to-3D video conversion for minimally invasive surgical video.
We design and implement a series of end-to-end U-Net-based solutions for this task.
We evaluate these solutions by surveying ten experts - surgeons who routinely perform endoscopic surgery.
arXiv Detail & Related papers (2021-09-16T21:22:43Z) - Facial Expression Recognition Under Partial Occlusion from Virtual
Reality Headsets based on Transfer Learning [0.0]
convolutional neural network based approaches has become widely adopted due to their proven applicability to Facial Expression Recognition task.
However, recognizing facial expression while wearing a head-mounted VR headset is a challenging task due to the upper half of the face being completely occluded.
We propose a geometric model to simulate occlusion resulting from a Samsung Gear VR headset that can be applied to existing FER datasets.
arXiv Detail & Related papers (2020-08-12T20:25:07Z) - EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's
Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images.
Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition.
We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.