Objaverse-XL: A Universe of 10M+ 3D Objects
- URL: http://arxiv.org/abs/2307.05663v1
- Date: Tue, 11 Jul 2023 17:57:40 GMT
- Title: Objaverse-XL: A Universe of 10M+ 3D Objects
- Authors: Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel,
Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak
Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari,
Kiana Ehsani, Ludwig Schmidt, Ali Farhadi
- Abstract summary: We present averse-XL, a dataset of over 10 million 3D objects.
We show that by training Zero123 on novel view, utilizing over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities.
- Score: 58.02773375519506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language processing and 2D vision models have attained remarkable
proficiency on many tasks primarily by escalating the scale of training data.
However, 3D vision tasks have not seen the same progress, in part due to the
challenges of acquiring high-quality 3D data. In this work, we present
Objaverse-XL, a dataset of over 10 million 3D objects. Our dataset comprises
deduplicated 3D objects from a diverse set of sources, including manually
designed objects, photogrammetry scans of landmarks and everyday items, and
professional scans of historic and antique artifacts. Representing the largest
scale and diversity in the realm of 3D datasets, Objaverse-XL enables
significant new possibilities for 3D vision. Our experiments demonstrate the
improvements enabled with the scale provided by Objaverse-XL. We show that by
training Zero123 on novel view synthesis, utilizing over 100 million multi-view
rendered images, we achieve strong zero-shot generalization abilities. We hope
that releasing Objaverse-XL will enable further innovations in the field of 3D
vision at scale.
Related papers
- Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - Uni3D: Exploring Unified 3D Representation at Scale [66.26710717073372]
We present Uni3D, a 3D foundation model to explore the unified 3D representation at scale.
Uni3D uses a 2D ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features.
We show that the strong Uni3D representation also enables applications such as 3D painting and retrieval in the wild.
arXiv Detail & Related papers (2023-10-10T16:49:21Z) - MagicDrive: Street View Generation with Diverse 3D Geometry Control [82.69871576797166]
We introduce MagicDrive, a novel street view generation framework, offering diverse 3D geometry controls.
Our design incorporates a cross-view attention module, ensuring consistency across multiple camera views.
arXiv Detail & Related papers (2023-10-04T06:14:06Z) - 3D Reconstruction of Objects in Hands without Real World 3D Supervision [12.70221786947807]
We propose modules to leverage 3D supervision to scale up the learning of models for reconstructing hand-held objects.
Specifically, we extract multiview 2D mask supervision from videos and 3D shape priors from shape collections.
We use these indirect 3D cues to train occupancy networks that predict the 3D shape of objects from a single RGB image.
arXiv Detail & Related papers (2023-05-04T17:56:48Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - Objaverse: A Universe of Annotated 3D Objects [53.2537614157313]
We present averse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive tags, captions and animations.
We demonstrate the large potential of averse 3D models via four applications: training diverse 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied vision models, and creating a new benchmark for robustness analysis of vision models.
arXiv Detail & Related papers (2022-12-15T18:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.