Related papers: Objaverse: A Universe of Annotated 3D Objects

Objaverse: A Universe of Annotated 3D Objects

URL: http://arxiv.org/abs/2212.08051v1
Date: Thu, 15 Dec 2022 18:56:53 GMT
Title: Objaverse: A Universe of Annotated 3D Objects
Authors: Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, Ali Farhadi
Abstract summary: We present averse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive tags, captions and animations. We demonstrate the large potential of averse 3D models via four applications: training diverse 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied vision models, and creating a new benchmark for robustness analysis of vision models.
Score: 53.2537614157313
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI. Large neural models trained on such datasets produce impressive results and top many of today's benchmarks. A notable omission within this family of large-scale datasets is 3D data. Despite considerable interest and potential applications in 3D vision, datasets of high-fidelity 3D models continue to be mid-sized with limited diversity of object categories. Addressing this gap, we present Objaverse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. Objaverse improves upon present day 3D repositories in terms of scale, number of categories, and in the visual diversity of instances within a category. We demonstrate the large potential of Objaverse via four diverse applications: training generative 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied AI, and creating a new benchmark for robustness analysis of vision models. Objaverse can open new directions for research and enable new applications across the field of AI.

Related papers

Objaverse++: Curated 3D Object Dataset with Quality Annotations [5.483023265209163]
This paper presents averse++, a curated subset of averse enhanced with detailed annotations by human experts. Although averse curation represents the largest available 3D asset collection, its utility is limited by the limitation of low-quality models.
arXiv Detail & Related papers (2025-04-09T23:29:08Z)
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring [49.78120051062641]
3D visual grounding aims to correlate a natural language description with the target object within a 3D scene. Existing approaches commonly encounter a shortage of text3D pairs available for training. We propose AugRefer, a novel approach for advancing 3D visual grounding.
arXiv Detail & Related papers (2025-01-16T09:57:40Z)
Open-Vocabulary High-Resolution 3D (OVHR3D) Data Segmentation and Annotation Framework [1.1280113914145702]
This research aims to design and develop a comprehensive and efficient framework for 3D segmentation tasks. The framework integrates Grounding DINO and Segment anything Model, augmented by an enhancement in 2D image rendering via 3D mesh.
arXiv Detail & Related papers (2024-12-09T07:39:39Z)
Diffusion Models in 3D Vision: A Survey [11.116658321394755]
We review the state-of-the-art approaches that leverage diffusion models for 3D visual tasks. These approaches include 3D object generation, shape completion, point cloud reconstruction, and scene understanding. We discuss potential solutions, including improving computational efficiency, enhancing multimodal fusion, and exploring the use of large-scale pretraining.
arXiv Detail & Related papers (2024-10-07T04:12:23Z)
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field. We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models. This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z)
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space. We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z)
DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations. We leverage the strong semantic prior within a 3D generative model to train a semantic decoder. Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z)
Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task. Our approach involves making initial predictions of 2D semantic masks using different large vision models. To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z)
Objaverse-XL: A Universe of 10M+ 3D Objects [58.02773375519506]
We present averse-XL, a dataset of over 10 million 3D objects. We show that by training Zero123 on novel view, utilizing over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities.
arXiv Detail & Related papers (2023-07-11T17:57:40Z)
SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval [17.286320102183502]
We introduce a novel SHREC challenge track that focuses on retrieving relevant 3D animal models from a dataset using sketch queries. Our contest requires participants to retrieve 3D models based on complex and detailed sketches. We receive satisfactory results from eight teams and 204 runs.
arXiv Detail & Related papers (2023-04-12T09:40:38Z)
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction [7.013794773659423]
Common Objects in 3D is a large-scale dataset with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds. The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories. We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several new-view-synthesis and category-centric 3D reconstruction methods.
arXiv Detail & Related papers (2021-09-01T17:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.