Real Time Egocentric Object Segmentation: THU-READ Labeling and
Benchmarking Results
- URL: http://arxiv.org/abs/2106.04957v1
- Date: Wed, 9 Jun 2021 10:10:02 GMT
- Title: Real Time Egocentric Object Segmentation: THU-READ Labeling and
Benchmarking Results
- Authors: E. Gonzalez-Sosa, G. Robledo, D. Gonzalez-Morin, P. Perez-Garcia and
A. Villegas
- Abstract summary: Egocentric segmentation has attracted recent interest in the computer vision community due to their potential in Mixed Reality (MR) applications.
We contribute with a semantic-wise labeling of a subset of 2124 images from the RGB-D THU-READ dataset.
We also report benchmarking results using Thundernet, a real-time semantic segmentation network, that could allow future integration with end-to-end MR applications.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Egocentric segmentation has attracted recent interest in the computer vision
community due to their potential in Mixed Reality (MR) applications. While most
previous works have been focused on segmenting egocentric human body parts
(mainly hands), little attention has been given to egocentric objects. Due to
the lack of datasets of pixel-wise annotations of egocentric objects, in this
paper we contribute with a semantic-wise labeling of a subset of 2124 images
from the RGB-D THU-READ Dataset. We also report benchmarking results using
Thundernet, a real-time semantic segmentation network, that could allow future
integration with end-to-end MR applications.
Related papers
- Cognition Transferring and Decoupling for Text-supervised Egocentric Semantic Segmentation [17.35953923039954]
Egocentic Semantic (TESS) task aims to assign pixel-level categories to egocentric images weakly supervised by texts from image-level labels.
We propose a Cognition Transferring and Decoupling Network (CTDN) that first learns the egocentric wearer-object relations via correlating the image and text.
arXiv Detail & Related papers (2024-10-02T08:58:34Z) - Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning [80.37314291927889]
We present EMBED, a method designed to transform exocentric video-language data for egocentric video representation learning.
Egocentric videos predominantly feature close-up hand-object interactions, whereas exocentric videos offer a broader perspective on human activities.
By applying both vision and language style transfer, our framework creates a new egocentric dataset.
arXiv Detail & Related papers (2024-08-07T06:10:45Z) - Object Aware Egocentric Online Action Detection [23.504280692701272]
We introduce an Object-Aware Module that integrates egocentric-specific priors into existing Online Action Detection frameworks.
Our work can be seamlessly integrated into existing models with minimal overhead and bring consistent performance enhancements.
arXiv Detail & Related papers (2024-06-03T07:58:40Z) - EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation [5.476136494434766]
We introduce EiCue, a technique providing semantic and structural cues through an eigenbasis derived from semantic similarity matrix.
We guide our model to learn object-level representations with intra- and inter-image object-feature consistency.
Experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results.
arXiv Detail & Related papers (2024-03-03T11:24:16Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Learning Fine-grained View-Invariant Representations from Unpaired
Ego-Exo Videos via Temporal Alignment [71.16699226211504]
We propose to learn fine-grained action features that are invariant to the viewpoints by aligning egocentric and exocentric videos in time.
To this end, we propose AE2, a self-supervised embedding approach with two key designs.
For evaluation, we establish a benchmark for fine-grained video understanding in the ego-exo context.
arXiv Detail & Related papers (2023-06-08T19:54:08Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset [19.496721051685135]
Embodied tracking is a key component to many egocentric vision problems.
EgoTracks is a new dataset for long-term egocentric visual object tracking.
We show improvements that can be made to a STARK tracker to significantly increase its performance on egocentric data.
arXiv Detail & Related papers (2023-01-09T09:10:35Z) - NeRF-SOS: Any-View Self-supervised Object Segmentation from Complex
Real-World Scenes [80.59831861186227]
This paper carries out the exploration of self-supervised learning for object segmentation using NeRF for complex real-world scenes.
Our framework, called NeRF with Self-supervised Object NeRF-SOS, encourages NeRF models to distill compact geometry-aware segmentation clusters.
It consistently surpasses other 2D-based self-supervised baselines and predicts finer semantics masks than existing supervised counterparts.
arXiv Detail & Related papers (2022-09-19T06:03:17Z) - Ego-Exo: Transferring Visual Representations from Third-person to
First-person Videos [92.38049744463149]
We introduce an approach for pre-training egocentric video models using large-scale third-person video datasets.
Our idea is to discover latent signals in third-person video that are predictive of key egocentric-specific properties.
Our experiments show that our Ego-Exo framework can be seamlessly integrated into standard video models.
arXiv Detail & Related papers (2021-04-16T06:10:10Z) - Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation
and Database with Automatic Labelling [1.0149624140985476]
This study focuses on the egocentric segmentation of arms to improve self-perception in Augmented Virtuality.
We report results on different real egocentric hand datasets, including GTEA Gaze+, EDSH, EgoHands, Ego Youtube Hands, THU-Read, TEgO, FPAB, and Ego Gesture.
Results confirm the suitability of the EgoArm dataset for this task, achieving improvement up to 40% with respect to the original network.
arXiv Detail & Related papers (2020-03-27T12:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.