Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild
with Pose Annotations
- URL: http://arxiv.org/abs/2012.09988v1
- Date: Fri, 18 Dec 2020 00:34:18 GMT
- Title: Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild
with Pose Annotations
- Authors: Adel Ahmadyan, Liangkai Zhang, Jianing Wei, Artsiom Ablavatski,
Matthias Grundmann
- Abstract summary: We introduce the Objectron dataset to advance the state of the art in 3D object detection.
The dataset contains object-centric short videos with pose annotations for nine categories and includes 4 million annotated images in 14,819 annotated videos.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D object detection has recently become popular due to many applications in
robotics, augmented reality, autonomy, and image retrieval. We introduce the
Objectron dataset to advance the state of the art in 3D object detection and
foster new research and applications, such as 3D object tracking, view
synthesis, and improved 3D shape representation. The dataset contains
object-centric short videos with pose annotations for nine categories and
includes 4 million annotated images in 14,819 annotated videos. We also propose
a new evaluation metric, 3D Intersection over Union, for 3D object detection.
We demonstrate the usefulness of our dataset in 3D object detection tasks by
providing baseline models trained on this dataset. Our dataset and evaluation
source code are available online at http://www.objectron.dev
Related papers
- ImageNet3D: Towards General-Purpose Object-Level 3D Understanding [20.837297477080945]
We present ImageNet3D, a large dataset for general-purpose object-level 3D understanding.
ImageNet3D augments 200 categories from the ImageNet dataset with 2D bounding box, 3D pose, 3D location annotations, and image captions interleaved with 3D information.
We consider two new tasks, probing of object-level 3D awareness and open vocabulary pose estimation, besides standard classification and pose estimation.
arXiv Detail & Related papers (2024-06-13T22:44:26Z) - Object2Scene: Putting Objects in Context for Open-Vocabulary 3D
Detection [24.871590175483096]
Point cloud-based open-vocabulary 3D object detection aims to detect 3D categories that do not have ground-truth annotations in the training set.
Previous approaches leverage large-scale richly-annotated image datasets as a bridge between 3D and category semantics.
We propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.
arXiv Detail & Related papers (2023-09-18T03:31:53Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life
3D Category Reconstruction [7.013794773659423]
Common Objects in 3D is a large-scale dataset with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds.
The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories.
We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several new-view-synthesis and category-centric 3D reconstruction methods.
arXiv Detail & Related papers (2021-09-01T17:59:05Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - LanguageRefer: Spatial-Language Model for 3D Visual Grounding [72.7618059299306]
We develop a spatial-language model for a 3D visual grounding problem.
We show that our model performs competitively on visio-linguistic datasets proposed by ReferIt3D.
arXiv Detail & Related papers (2021-07-07T18:55:03Z) - SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and
3D Mesh Reconstruction from Video Data [124.2624568006391]
We present SAIL-VOS 3D: a synthetic video dataset with frame-by-frame mesh annotations.
We also develop first baselines for reconstruction of 3D meshes from video data via temporal models.
arXiv Detail & Related papers (2021-05-18T15:42:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.