Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life
3D Category Reconstruction
- URL: http://arxiv.org/abs/2109.00512v1
- Date: Wed, 1 Sep 2021 17:59:05 GMT
- Title: Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life
3D Category Reconstruction
- Authors: Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone,
Patrick Labatut, David Novotny
- Abstract summary: Common Objects in 3D is a large-scale dataset with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds.
The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories.
We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several new-view-synthesis and category-centric 3D reconstruction methods.
- Score: 7.013794773659423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional approaches for learning 3D object categories have been
predominantly trained and evaluated on synthetic datasets due to the
unavailability of real 3D-annotated category-centric data. Our main goal is to
facilitate advances in this field by collecting real-world data in a magnitude
similar to the existing synthetic counterparts. The principal contribution of
this work is thus a large-scale dataset, called Common Objects in 3D, with real
multi-view images of object categories annotated with camera poses and ground
truth 3D point clouds. The dataset contains a total of 1.5 million frames from
nearly 19,000 videos capturing objects from 50 MS-COCO categories and, as such,
it is significantly larger than alternatives both in terms of the number of
categories and objects. We exploit this new dataset to conduct one of the first
large-scale "in-the-wild" evaluations of several new-view-synthesis and
category-centric 3D reconstruction methods. Finally, we contribute NerFormer -
a novel neural rendering method that leverages the powerful Transformer to
reconstruct an object given a small number of its views. The CO3D dataset is
available at https://github.com/facebookresearch/co3d .
Related papers
- OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - Objaverse: A Universe of Annotated 3D Objects [53.2537614157313]
We present averse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive tags, captions and animations.
We demonstrate the large potential of averse 3D models via four applications: training diverse 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied vision models, and creating a new benchmark for robustness analysis of vision models.
arXiv Detail & Related papers (2022-12-15T18:56:53Z) - Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable
Categories [80.30216777363057]
We introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets.
At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views.
Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
arXiv Detail & Related papers (2022-11-07T22:42:42Z) - ABO: Dataset and Benchmarks for Real-World 3D Object Understanding [43.42504014918771]
Amazon-Berkeley Objects (ABO) is a large-scale dataset of product images and 3D models corresponding to real household objects.
We use ABO to measure the domain gap for single-view 3D reconstruction networks trained on synthetic objects.
We also use multi-view images from ABO to measure the robustness of state-of-the-art metric learning approaches to different camera viewpoints.
arXiv Detail & Related papers (2021-10-12T17:52:42Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild
with Pose Annotations [0.0]
We introduce the Objectron dataset to advance the state of the art in 3D object detection.
The dataset contains object-centric short videos with pose annotations for nine categories and includes 4 million annotated images in 14,819 annotated videos.
arXiv Detail & Related papers (2020-12-18T00:34:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.