EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object
Understanding
- URL: http://arxiv.org/abs/2309.08816v1
- Date: Fri, 15 Sep 2023 23:55:43 GMT
- Title: EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object
Understanding
- Authors: Chenchen Zhu, Fanyi Xiao, Andres Alvarado, Yasmine Babaei, Jiabo Hu,
Hichem El-Mohri, Sean Chang Culatana, Roshan Sumbaly, Zhicheng Yan
- Abstract summary: EgoObjects is a large-scale egocentric dataset for fine-grained object understanding.
Pilot version contains over 9K videos collected by 250 participants from 50+ countries using 4 wearable devices.
EgoObjects also annotates each object with an instance-level identifier.
- Score: 11.9023437362986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object understanding in egocentric visual data is arguably a fundamental
research topic in egocentric vision. However, existing object datasets are
either non-egocentric or have limitations in object categories, visual content,
and annotation granularities. In this work, we introduce EgoObjects, a
large-scale egocentric dataset for fine-grained object understanding. Its Pilot
version contains over 9K videos collected by 250 participants from 50+
countries using 4 wearable devices, and over 650K object annotations from 368
object categories. Unlike prior datasets containing only object category
labels, EgoObjects also annotates each object with an instance-level
identifier, and includes over 14K unique object instances. EgoObjects was
designed to capture the same object under diverse background complexities,
surrounding objects, distance, lighting and camera motion. In parallel to the
data collection, we conducted data annotation by developing a multi-stage
federated annotation process to accommodate the growing nature of the dataset.
To bootstrap the research on EgoObjects, we present a suite of 4 benchmark
tasks around the egocentric object understanding, including a novel instance
level- and the classical category level object detection. Moreover, we also
introduce 2 novel continual learning object detection tasks. The dataset and
API are available at https://github.com/facebookresearch/EgoObjects.
Related papers
- 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - DoUnseen: Tuning-Free Class-Adaptive Object Detection of Unseen Objects
for Robotic Grasping [1.6317061277457001]
We develop an object detector that requires no fine-tuning and can add any object as a class just by capturing a few images of the object.
We evaluate our class-adaptive object detector on unseen datasets and compare it to a trained Mask R-CNN on those datasets.
arXiv Detail & Related papers (2023-04-06T02:45:39Z) - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence.
The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets.
We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z) - EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset [19.496721051685135]
Embodied tracking is a key component to many egocentric vision problems.
EgoTracks is a new dataset for long-term egocentric visual object tracking.
We show improvements that can be made to a STARK tracker to significantly increase its performance on egocentric data.
arXiv Detail & Related papers (2023-01-09T09:10:35Z) - Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and
Applications [20.571026014771828]
We provide a labeled dataset consisting of 11,243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with.
Our dataset is the first to label detailed hand-object contact boundaries.
We show that our robust hand-object segmentation model and dataset can serve as a foundational tool to boost or enable several downstream vision applications.
arXiv Detail & Related papers (2022-08-07T21:43:40Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.