Objaverse: A Universe of Annotated 3D Objects
- URL: http://arxiv.org/abs/2212.08051v1
- Date: Thu, 15 Dec 2022 18:56:53 GMT
- Title: Objaverse: A Universe of Annotated 3D Objects
- Authors: Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel,
Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, Ali Farhadi
- Abstract summary: We present averse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive tags, captions and animations.
We demonstrate the large potential of averse 3D models via four applications: training diverse 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied vision models, and creating a new benchmark for robustness analysis of vision models.
- Score: 53.2537614157313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Massive data corpora like WebText, Wikipedia, Conceptual Captions,
WebImageText, and LAION have propelled recent dramatic progress in AI. Large
neural models trained on such datasets produce impressive results and top many
of today's benchmarks. A notable omission within this family of large-scale
datasets is 3D data. Despite considerable interest and potential applications
in 3D vision, datasets of high-fidelity 3D models continue to be mid-sized with
limited diversity of object categories. Addressing this gap, we present
Objaverse 1.0, a large dataset of objects with 800K+ (and growing) 3D models
with descriptive captions, tags, and animations. Objaverse improves upon
present day 3D repositories in terms of scale, number of categories, and in the
visual diversity of instances within a category. We demonstrate the large
potential of Objaverse via four diverse applications: training generative 3D
models, improving tail category segmentation on the LVIS benchmark, training
open-vocabulary object-navigation models for Embodied AI, and creating a new
benchmark for robustness analysis of vision models. Objaverse can open new
directions for research and enable new applications across the field of AI.
Related papers
- AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring [49.78120051062641]
3D visual grounding aims to correlate a natural language description with the target object within a 3D scene.
Existing approaches commonly encounter a shortage of text3D pairs available for training.
We propose AugRefer, a novel approach for advancing 3D visual grounding.
arXiv Detail & Related papers (2025-01-16T09:57:40Z) - Open-Vocabulary High-Resolution 3D (OVHR3D) Data Segmentation and Annotation Framework [1.1280113914145702]
This research aims to design and develop a comprehensive and efficient framework for 3D segmentation tasks.
The framework integrates Grounding DINO and Segment anything Model, augmented by an enhancement in 2D image rendering via 3D mesh.
arXiv Detail & Related papers (2024-12-09T07:39:39Z) - Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - Objaverse-XL: A Universe of 10M+ 3D Objects [58.02773375519506]
We present averse-XL, a dataset of over 10 million 3D objects.
We show that by training Zero123 on novel view, utilizing over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities.
arXiv Detail & Related papers (2023-07-11T17:57:40Z) - Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life
3D Category Reconstruction [7.013794773659423]
Common Objects in 3D is a large-scale dataset with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds.
The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories.
We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several new-view-synthesis and category-centric 3D reconstruction methods.
arXiv Detail & Related papers (2021-09-01T17:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.