Diffusion Models in 3D Vision: A Survey
- URL: http://arxiv.org/abs/2410.04738v2
- Date: Tue, 15 Oct 2024 06:03:52 GMT
- Title: Diffusion Models in 3D Vision: A Survey
- Authors: Zhen Wang, Dongyuan Li, Renhe Jiang,
- Abstract summary: We review the state-of-the-art approaches that leverage diffusion models for 3D visual tasks.
These approaches include 3D object generation, shape completion, point cloud reconstruction, and scene understanding.
We discuss potential solutions, including improving computational efficiency, enhancing multimodal fusion, and exploring the use of large-scale pretraining.
- Score: 11.116658321394755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, 3D vision has become a crucial field within computer vision, powering a wide range of applications such as autonomous driving, robotics, augmented reality (AR), and medical imaging. This field relies on the accurate perception, understanding, and reconstruction of 3D scenes from 2D data sources like images and videos. Diffusion models, originally designed for 2D generative tasks, offer the potential for more flexible, probabilistic approaches that can better capture the variability and uncertainty present in real-world 3D data. However, traditional methods often struggle with efficiency and scalability. In this paper, we review the state-of-the-art approaches that leverage diffusion models for 3D visual tasks, including but not limited to 3D object generation, shape completion, point cloud reconstruction, and scene understanding. We provide an in-depth discussion of the underlying mathematical principles of diffusion models, outlining their forward and reverse processes, as well as the various architectural advancements that enable these models to work with 3D datasets. We also discuss the key challenges in applying diffusion models to 3D vision, such as handling occlusions and varying point densities, and the computational demands of high-dimensional data. Finally, we discuss potential solutions, including improving computational efficiency, enhancing multimodal fusion, and exploring the use of large-scale pretraining for better generalization across 3D tasks. This paper serves as a foundation for future exploration and development in this rapidly evolving field.
Related papers
- Foundational Models for 3D Point Clouds: A Survey and Outlook [50.61473863985571]
3D point cloud representation plays a crucial role in preserving the geometric fidelity of the physical world.
To bridge this gap, it becomes essential to incorporate multiple modalities.
Foundation models (FMs) can seamlessly integrate and reason across these modalities.
arXiv Detail & Related papers (2025-01-30T18:59:43Z) - 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow [69.94527569577295]
3D vision and spatial reasoning have long been recognized as preferable for accurately perceiving our three-dimensional world.
Due to the difficulties in collecting high-quality 3D data, research in this area has only recently gained momentum.
We propose converting existing densely activated LLMs into mixture-of-experts (MoE) models, which have proven effective for multi-modal data processing.
arXiv Detail & Related papers (2025-01-28T04:31:19Z) - 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models.
For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts.
We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z) - Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation.
We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators.
Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields.
We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools.
Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z) - 3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields.
Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields.
We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z) - Deep Generative Models on 3D Representations: A Survey [81.73385191402419]
Generative models aim to learn the distribution of observed data by generating new instances.
Recently, researchers started to shift focus from 2D to 3D space.
representing 3D data poses significantly greater challenges.
arXiv Detail & Related papers (2022-10-27T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.