Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation
and Database with Automatic Labelling
- URL: http://arxiv.org/abs/2003.12352v1
- Date: Fri, 27 Mar 2020 12:09:27 GMT
- Title: Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation
and Database with Automatic Labelling
- Authors: Ester Gonzalez-Sosa, Pablo Perez, Ruben Tolosana, Redouane Kachach,
Alvaro Villegas
- Abstract summary: This study focuses on the egocentric segmentation of arms to improve self-perception in Augmented Virtuality.
We report results on different real egocentric hand datasets, including GTEA Gaze+, EDSH, EgoHands, Ego Youtube Hands, THU-Read, TEgO, FPAB, and Ego Gesture.
Results confirm the suitability of the EgoArm dataset for this task, achieving improvement up to 40% with respect to the original network.
- Score: 1.0149624140985476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we focus on the egocentric segmentation of arms to improve
self-perception in Augmented Virtuality (AV). The main contributions of this
work are: i) a comprehensive survey of segmentation algorithms for AV; ii) an
Egocentric Arm Segmentation Dataset, composed of more than 10, 000 images,
comprising variations of skin color, and gender, among others. We provide all
details required for the automated generation of groundtruth and semi-synthetic
images; iii) the use of deep learning for the first time for segmenting arms in
AV; iv) to showcase the usefulness of this database, we report results on
different real egocentric hand datasets, including GTEA Gaze+, EDSH, EgoHands,
Ego Youtube Hands, THU-Read, TEgO, FPAB, and Ego Gesture, which allow for
direct comparisons with existing approaches utilizing color or depth. Results
confirm the suitability of the EgoArm dataset for this task, achieving
improvement up to 40% with respect to the original network, depending on the
particular dataset. Results also suggest that, while approaches based on color
or depth can work in controlled conditions (lack of occlusion, uniform
lighting, only objects of interest in the near range, controlled background,
etc.), egocentric segmentation based on deep learning is more robust in real AV
applications.
Related papers
- Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning [80.37314291927889]
We present EMBED, a method designed to transform exocentric video-language data for egocentric video representation learning.
Egocentric videos predominantly feature close-up hand-object interactions, whereas exocentric videos offer a broader perspective on human activities.
By applying both vision and language style transfer, our framework creates a new egocentric dataset.
arXiv Detail & Related papers (2024-08-07T06:10:45Z) - EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? [48.702973928321946]
We introduce a novel asymmetric contrastive objective for EgoHOI named EgoNCE++.
Our experiments demonstrate that EgoNCE++ significantly boosts open-vocabulary HOI recognition, multi-instance retrieval, and action recognition tasks.
arXiv Detail & Related papers (2024-05-28T00:27:29Z) - EgoPCA: A New Framework for Egocentric Hand-Object Interaction
Understanding [99.904140768186]
This paper proposes a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA)
We contribute comprehensive pre-train sets, balanced test sets and a new baseline, which are complete with a training-finetuning strategy.
We believe our data and the findings will pave a new way for Ego-HOI understanding.
arXiv Detail & Related papers (2023-09-05T17:51:16Z) - Learning Fine-grained View-Invariant Representations from Unpaired
Ego-Exo Videos via Temporal Alignment [71.16699226211504]
We propose to learn fine-grained action features that are invariant to the viewpoints by aligning egocentric and exocentric videos in time.
To this end, we propose AE2, a self-supervised embedding approach with two key designs.
For evaluation, we establish a benchmark for fine-grained video understanding in the ego-exo context.
arXiv Detail & Related papers (2023-06-08T19:54:08Z) - Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and
Applications [20.571026014771828]
We provide a labeled dataset consisting of 11,243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with.
Our dataset is the first to label detailed hand-object contact boundaries.
We show that our robust hand-object segmentation model and dataset can serve as a foundational tool to boost or enable several downstream vision applications.
arXiv Detail & Related papers (2022-08-07T21:43:40Z) - Egocentric Video-Language Pretraining [74.04740069230692]
Video-Language Pretraining aims to learn transferable representation to advance a wide range of video-text downstream tasks.
We exploit the recently released Ego4D dataset to pioneer Egocentric training along three directions.
We demonstrate strong performance on five egocentric downstream tasks across three datasets.
arXiv Detail & Related papers (2022-06-03T16:28:58Z) - Real Time Egocentric Object Segmentation: THU-READ Labeling and
Benchmarking Results [0.0]
Egocentric segmentation has attracted recent interest in the computer vision community due to their potential in Mixed Reality (MR) applications.
We contribute with a semantic-wise labeling of a subset of 2124 images from the RGB-D THU-READ dataset.
We also report benchmarking results using Thundernet, a real-time semantic segmentation network, that could allow future integration with end-to-end MR applications.
arXiv Detail & Related papers (2021-06-09T10:10:02Z) - Ego-Exo: Transferring Visual Representations from Third-person to
First-person Videos [92.38049744463149]
We introduce an approach for pre-training egocentric video models using large-scale third-person video datasets.
Our idea is to discover latent signals in third-person video that are predictive of key egocentric-specific properties.
Our experiments show that our Ego-Exo framework can be seamlessly integrated into standard video models.
arXiv Detail & Related papers (2021-04-16T06:10:10Z) - Ego2Hands: A Dataset for Egocentric Two-hand Segmentation and Detection [1.0742675209112622]
We present Ego2Hands, a large-scale RGB-based egocentric hand segmentation/detection dataset that is semi-automatically annotated.
For quantitative analysis, we manually annotated an evaluation set that significantly exceeds existing benchmarks in quantity, diversity and annotation accuracy.
arXiv Detail & Related papers (2020-11-14T10:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.