UniG3D: A Unified 3D Object Generation Dataset
- URL: http://arxiv.org/abs/2306.10730v1
- Date: Mon, 19 Jun 2023 07:03:45 GMT
- Title: UniG3D: A Unified 3D Object Generation Dataset
- Authors: Qinghong Sun, Yangguang Li, ZeXiang Liu, Xiaoshui Huang, Fenggang Liu,
Xihui Liu, Wanli Ouyang, Jing Shao
- Abstract summary: UniG3D is a unified 3D object generation dataset constructed by employing a universal data transformation pipeline on ShapeNet datasets.
This pipeline converts each raw 3D model into comprehensive multi-modal data representation.
The selection of data sources for our dataset is based on their scale and quality.
- Score: 75.49544172927749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of generative AI has a transformative impact on various areas,
including virtual reality, autonomous driving, the metaverse, gaming, and
robotics. Among these applications, 3D object generation techniques are of
utmost importance. This technique has unlocked fresh avenues in the realm of
creating, customizing, and exploring 3D objects. However, the quality and
diversity of existing 3D object generation methods are constrained by the
inadequacies of existing 3D object datasets, including issues related to text
quality, the incompleteness of multi-modal data representation encompassing 2D
rendered images and 3D assets, as well as the size of the dataset. In order to
resolve these issues, we present UniG3D, a unified 3D object generation dataset
constructed by employing a universal data transformation pipeline on Objaverse
and ShapeNet datasets. This pipeline converts each raw 3D model into
comprehensive multi-modal data representation <text, image, point cloud, mesh>
by employing rendering engines and multi-modal models. These modules ensure the
richness of textual information and the comprehensiveness of data
representation. Remarkably, the universality of our pipeline refers to its
ability to be applied to any 3D dataset, as it only requires raw 3D data. The
selection of data sources for our dataset is based on their scale and quality.
Subsequently, we assess the effectiveness of our dataset by employing Point-E
and SDFusion, two widely recognized methods for object generation, tailored to
the prevalent 3D representations of point clouds and signed distance functions.
Our dataset is available at: https://unig3d.github.io.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.
The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - MDT3D: Multi-Dataset Training for LiDAR 3D Object Detection
Generalization [3.8243923744440926]
3D object detection models trained on a source dataset with a specific point distribution have shown difficulties in generalizing to unseen datasets.
We leverage the information available from several annotated source datasets with our Multi-Dataset Training for 3D Object Detection (MDT3D) method.
We show how we managed the mix of datasets during training and finally introduce a new cross-dataset augmentation method: cross-dataset object injection.
arXiv Detail & Related papers (2023-08-02T08:20:00Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans [6.936271803454143]
We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG)
We created RIORefer, a large-scale 3D visual grounding dataset.
It includes more than 63k diverse descriptions of 3D objects within 1,380 indoor RGB-D scans from 3RScan.
arXiv Detail & Related papers (2023-05-23T09:52:49Z) - Info3D: Representation Learning on 3D Objects using Mutual Information
Maximization and Contrastive Learning [8.448611728105513]
We propose to extend the InfoMax and contrastive learning principles on 3D shapes.
We show that we can maximize the mutual information between 3D objects and their "chunks" to improve the representations in aligned datasets.
arXiv Detail & Related papers (2020-06-04T00:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.