Real Time Egocentric Segmentation for Video-self Avatar in Mixed Reality
- URL: http://arxiv.org/abs/2207.01296v1
- Date: Mon, 4 Jul 2022 10:00:16 GMT
- Title: Real Time Egocentric Segmentation for Video-self Avatar in Mixed Reality
- Authors: Ester Gonzalez-Sosa, Andrija Gajic, Diego Gonzalez-Morin, Guillermo
Robledo, Pablo Perez and Alvaro Villegas
- Abstract summary: Our algorithm achieves a frame rate of 66 fps for an input resolution of 640x480, thanks to our shallow network inspired in Thundernet's architecture.
We describe the creation process of our Egocentric Bodies dataset, composed of almost 10,000 images from three datasets.
- Score: 0.946046736912201
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work we present our real-time egocentric body segmentation algorithm.
Our algorithm achieves a frame rate of 66 fps for an input resolution of
640x480, thanks to our shallow network inspired in Thundernet's architecture.
Besides, we put a strong emphasis on the variability of the training data. More
concretely, we describe the creation process of our Egocentric Bodies
(EgoBodies) dataset, composed of almost 10,000 images from three datasets,
created both from synthetic methods and real capturing. We conduct experiments
to understand the contribution of the individual datasets; compare Thundernet
model trained with EgoBodies with simpler and more complex previous approaches
and discuss their corresponding performance in a real-life setup in terms of
segmentation quality and inference times. The described trained semantic
segmentation algorithm is already integrated in an end-to-end system for Mixed
Reality (MR), making it possible for users to see his/her own body while being
immersed in a MR scene.
Related papers
- X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization [56.75782714530429]
We propose a cross-modal adaptation framework, which we call X-MIC.
Our pipeline learns to align frozen text embeddings to each egocentric video directly in the shared embedding space.
This results in an enhanced alignment of text embeddings to each egocentric video, leading to a significant improvement in cross-dataset generalization.
arXiv Detail & Related papers (2024-03-28T19:45:35Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - Full Body Video-Based Self-Avatars for Mixed Reality: from E2E System to
User Study [1.0149624140985476]
This work explores the creation of self-avatars through video pass-through in Mixed Reality (MR) applications.
We present our end-to-end system, including: custom MR video pass-through implementation on a commercial head mounted display (HMD)
To validate this technology, we designed an immersive VR experience where the user has to walk through a narrow tiles path over an active volcano crater.
arXiv Detail & Related papers (2022-08-24T20:59:17Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - TVDIM: Enhancing Image Self-Supervised Pretraining via Noisy Text Data [13.68491474904529]
We propose Text-enhanced Visual Deep InfoMax (TVDIM) to learn better visual representations.
Our core idea of self-supervised learning is to maximize the mutual information between features extracted from multiple views.
TVDIM significantly outperforms previous visual self-supervised methods when processing the same set of images.
arXiv Detail & Related papers (2021-06-03T12:36:01Z) - Ego-Exo: Transferring Visual Representations from Third-person to
First-person Videos [92.38049744463149]
We introduce an approach for pre-training egocentric video models using large-scale third-person video datasets.
Our idea is to discover latent signals in third-person video that are predictive of key egocentric-specific properties.
Our experiments show that our Ego-Exo framework can be seamlessly integrated into standard video models.
arXiv Detail & Related papers (2021-04-16T06:10:10Z) - Representation Learning with Video Deep InfoMax [26.692717942430185]
We extend DeepInfoMax to the video domain by leveraging similar structure intemporal networks.
We find that drawing views from both natural-rate sequences and temporally-downsampled sequences yields results on Kinetics-pretrained action recognition tasks.
arXiv Detail & Related papers (2020-07-27T02:28:47Z) - Learning to simulate complex scenes [18.51564016785853]
This paper explores content adaptation in the context of semantic segmentation.
We propose a scalable discretization-and-relaxation (SDR) approach to optimize the attribute values and obtain a training set of similar content to real-world data.
Experiment shows our system can generate reasonable and useful scenes, from which we obtain promising real-world segmentation accuracy.
arXiv Detail & Related papers (2020-06-25T17:51:34Z) - Egocentric Human Segmentation for Mixed Reality [1.0149624140985476]
We create a semi-synthetic dataset composed of more than 15, 000 realistic images.
We implement a deep learning semantic segmentation algorithm that is able to perform beyond real-time requirements.
arXiv Detail & Related papers (2020-05-25T12:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.