OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation
- URL: http://arxiv.org/abs/2301.07525v2
- Date: Tue, 11 Apr 2023 17:41:17 GMT
- Title: OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation
- Authors: Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan,
Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, Ziwei Liu
- Abstract summary: We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
- Score: 107.71752592196138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in modeling 3D objects mostly rely on synthetic datasets due
to the lack of large-scale realscanned 3D databases. To facilitate the
development of 3D perception, reconstruction, and generation in the real world,
we propose OmniObject3D, a large vocabulary 3D object dataset with massive
high-quality real-scanned 3D objects. OmniObject3D has several appealing
properties: 1) Large Vocabulary: It comprises 6,000 scanned objects in 190
daily categories, sharing common classes with popular 2D datasets (e.g.,
ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations.
2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors,
providing textured meshes, point clouds, multiview rendered images, and
multiple real-captured videos. 3) Realistic Scans: The professional scanners
support highquality object scans with precise shapes and realistic appearances.
With the vast exploration space offered by OmniObject3D, we carefully set up
four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c)
neural surface reconstruction, and d) 3D object generation. Extensive studies
are performed on these four benchmarks, revealing new observations, challenges,
and opportunities for future research in realistic 3D vision.
Related papers
- ImageNet3D: Towards General-Purpose Object-Level 3D Understanding [20.837297477080945]
We present ImageNet3D, a large dataset for general-purpose object-level 3D understanding.
ImageNet3D augments 200 categories from the ImageNet dataset with 2D bounding box, 3D pose, 3D location annotations, and image captions interleaved with 3D information.
We consider two new tasks, probing of object-level 3D awareness and open vocabulary pose estimation, besides standard classification and pose estimation.
arXiv Detail & Related papers (2024-06-13T22:44:26Z) - Uni3D: Exploring Unified 3D Representation at Scale [66.26710717073372]
We present Uni3D, a 3D foundation model to explore the unified 3D representation at scale.
Uni3D uses a 2D ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features.
We show that the strong Uni3D representation also enables applications such as 3D painting and retrieval in the wild.
arXiv Detail & Related papers (2023-10-10T16:49:21Z) - Large-Vocabulary 3D Diffusion Model with Transformer [57.076986347047]
We introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model.
We propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance.
arXiv Detail & Related papers (2023-09-14T17:59:53Z) - 3D-LLM: Injecting the 3D World into Large Language Models [60.43823088804661]
Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning.
We propose to inject the 3D world into large language models and introduce a new family of 3D-LLMs.
Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks.
arXiv Detail & Related papers (2023-07-24T17:59:02Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices [78.20154723650333]
High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation.
We introduce a novel multi-view RGBD dataset captured using a mobile device.
We obtain precise 3D ground-truth shape without relying on high-end 3D scanners.
arXiv Detail & Related papers (2023-03-03T14:02:50Z) - Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild [32.05421669957098]
Large datasets and scalable solutions have led to unprecedented advances in 2D recognition.
We revisit the task of 3D object detection by introducing a large benchmark, called Omni3D.
We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks.
arXiv Detail & Related papers (2022-07-21T17:56:22Z) - HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D
Reconstruction [37.29140654256627]
We present a photo-realistic object-centric dataset HM3D-ABO.
It is constructed by composing realistic indoor scene and realistic object.
The dataset could also be useful for tasks such as camera pose estimation and novel-view synthesis.
arXiv Detail & Related papers (2022-06-24T16:02:01Z) - Voxel-based 3D Detection and Reconstruction of Multiple Objects from a
Single Image [22.037472446683765]
We learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator.
Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space.
We devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation.
arXiv Detail & Related papers (2021-11-04T18:30:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.