Egocentric Human Segmentation for Mixed Reality
- URL: http://arxiv.org/abs/2005.12074v2
- Date: Mon, 8 Jun 2020 14:58:07 GMT
- Title: Egocentric Human Segmentation for Mixed Reality
- Authors: Andrija Gajic and Ester Gonzalez-Sosa and Diego Gonzalez-Morin and
Marcos Escudero-Vi\~nolo and Alvaro Villegas
- Abstract summary: We create a semi-synthetic dataset composed of more than 15, 000 realistic images.
We implement a deep learning semantic segmentation algorithm that is able to perform beyond real-time requirements.
- Score: 1.0149624140985476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of this work is to segment human body parts from egocentric
video using semantic segmentation networks. Our contribution is two-fold: i) we
create a semi-synthetic dataset composed of more than 15, 000 realistic images
and associated pixel-wise labels of egocentric human body parts, such as arms
or legs including different demographic factors; ii) building upon the
ThunderNet architecture, we implement a deep learning semantic segmentation
algorithm that is able to perform beyond real-time requirements (16 ms for 720
x 720 images). It is believed that this method will enhance sense of presence
of Virtual Environments and will constitute a more realistic solution to the
standard virtual avatars.
Related papers
- EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z) - MARVIS: Motion & Geometry Aware Real and Virtual Image Segmentation [19.464362358936906]
We propose a novel approach for segmentation on real and virtual image regions.
By creating realistic synthetic images that mimic the complexities of the water surface, we provide fine-grained training data for our network.
We achieve state-of-the-art real-virtual image segmentation performance in unseen real world domain.
arXiv Detail & Related papers (2024-03-14T20:18:08Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and
Applications [20.571026014771828]
We provide a labeled dataset consisting of 11,243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with.
Our dataset is the first to label detailed hand-object contact boundaries.
We show that our robust hand-object segmentation model and dataset can serve as a foundational tool to boost or enable several downstream vision applications.
arXiv Detail & Related papers (2022-08-07T21:43:40Z) - Real Time Egocentric Segmentation for Video-self Avatar in Mixed Reality [0.946046736912201]
Our algorithm achieves a frame rate of 66 fps for an input resolution of 640x480, thanks to our shallow network inspired in Thundernet's architecture.
We describe the creation process of our Egocentric Bodies dataset, composed of almost 10,000 images from three datasets.
arXiv Detail & Related papers (2022-07-04T10:00:16Z) - HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments.
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames.
Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - EgoRenderer: Rendering Human Avatars from Egocentric Camera Images [87.96474006263692]
We present EgoRenderer, a system for rendering full-body neural avatars of a person captured by a wearable, egocentric fisheye camera.
Rendering full-body avatars from such egocentric images come with unique challenges due to the top-down view and large distortions.
We tackle these challenges by decomposing the rendering process into several steps, including texture synthesis, pose construction, and neural image translation.
arXiv Detail & Related papers (2021-11-24T18:33:02Z) - Real Time Egocentric Object Segmentation: THU-READ Labeling and
Benchmarking Results [0.0]
Egocentric segmentation has attracted recent interest in the computer vision community due to their potential in Mixed Reality (MR) applications.
We contribute with a semantic-wise labeling of a subset of 2124 images from the RGB-D THU-READ dataset.
We also report benchmarking results using Thundernet, a real-time semantic segmentation network, that could allow future integration with end-to-end MR applications.
arXiv Detail & Related papers (2021-06-09T10:10:02Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.