Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild
- URL: http://arxiv.org/abs/2207.10660v2
- Date: Fri, 24 Mar 2023 00:42:18 GMT
- Title: Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild
- Authors: Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin
Johnson, Georgia Gkioxari
- Abstract summary: Large datasets and scalable solutions have led to unprecedented advances in 2D recognition.
We revisit the task of 3D object detection by introducing a large benchmark, called Omni3D.
We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks.
- Score: 32.05421669957098
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recognizing scenes and objects in 3D from a single image is a longstanding
goal of computer vision with applications in robotics and AR/VR. For 2D
recognition, large datasets and scalable solutions have led to unprecedented
advances. In 3D, existing benchmarks are small in size and approaches
specialize in few object categories and specific domains, e.g. urban driving
scenes. Motivated by the success of 2D recognition, we revisit the task of 3D
object detection by introducing a large benchmark, called Omni3D. Omni3D
re-purposes and combines existing datasets resulting in 234k images annotated
with more than 3 million instances and 98 categories. 3D detection at such
scale is challenging due to variations in camera intrinsics and the rich
diversity of scene and object types. We propose a model, called Cube R-CNN,
designed to generalize across camera and scene types with a unified approach.
We show that Cube R-CNN outperforms prior works on the larger Omni3D and
existing benchmarks. Finally, we prove that Omni3D is a powerful dataset for 3D
object recognition and show that it improves single-dataset performance and can
accelerate learning on new smaller datasets via pre-training.
Related papers
- DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Uni3D: Exploring Unified 3D Representation at Scale [66.26710717073372]
We present Uni3D, a 3D foundation model to explore the unified 3D representation at scale.
Uni3D uses a 2D ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features.
We show that the strong Uni3D representation also enables applications such as 3D painting and retrieval in the wild.
arXiv Detail & Related papers (2023-10-10T16:49:21Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - Gait Recognition in the Wild with Dense 3D Representations and A
Benchmark [86.68648536257588]
Existing studies for gait recognition are dominated by 2D representations like the silhouette or skeleton of the human body in constrained scenes.
This paper aims to explore dense 3D representations for gait recognition in the wild.
We build the first large-scale 3D representation-based gait recognition dataset, named Gait3D.
arXiv Detail & Related papers (2022-04-06T03:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.