LDM3D: Latent Diffusion Model for 3D
- URL: http://arxiv.org/abs/2305.10853v2
- Date: Sun, 21 May 2023 20:26:30 GMT
- Title: LDM3D: Latent Diffusion Model for 3D
- Authors: Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will
Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias
Muller, Vasudev Lal
- Abstract summary: This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt.
We also develop an application called DepthFusion, which uses the generated RGB images and depth maps to create immersive and interactive 360-degree-view experiences.
- Score: 5.185393944663932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that
generates both image and depth map data from a given text prompt, allowing
users to generate RGBD images from text prompts. The LDM3D model is fine-tuned
on a dataset of tuples containing an RGB image, depth map and caption, and
validated through extensive experiments. We also develop an application called
DepthFusion, which uses the generated RGB images and depth maps to create
immersive and interactive 360-degree-view experiences using TouchDesigner. This
technology has the potential to transform a wide range of industries, from
entertainment and gaming to architecture and design. Overall, this paper
presents a significant contribution to the field of generative AI and computer
vision, and showcases the potential of LDM3D and DepthFusion to revolutionize
content creation and digital experiences. A short video summarizing the
approach can be found at https://t.ly/tdi2.
Related papers
- MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion [10.912989885886617]
We propose MTFusion, which leverages both image data and textual descriptions for high-fidelity 3D reconstruction.
Our approach consists of two stages. First, we adopt a novel multi-word textual inversion technique to extract a detailed text description.
Then, we use this description and the image to generate a 3D model with FlexiCubes.
arXiv Detail & Related papers (2024-11-19T03:29:18Z) - GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.
Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.
The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation [45.218605449572586]
3D-Adapter is a plug-in module designed to infuse 3D geometry awareness into pretrained image diffusion models.
We show that 3D-Adapter greatly enhances the geometry quality of text-to-multi-view models such as Instant3D and Zero123++.
We also showcase the broad application potential of 3D-Adapter by presenting high quality results in text-to-3D, image-to-3D, text-to-texture, and text-to-avatar tasks.
arXiv Detail & Related papers (2024-10-24T17:59:30Z) - RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [39.03289977892935]
RealmDreamer is a technique for generation of general forward-facing 3D scenes from text descriptions.
Our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles.
arXiv Detail & Related papers (2024-04-10T17:57:41Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
2D Diffusion Models [102.75875255071246]
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community.
We propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.
arXiv Detail & Related papers (2023-11-09T15:51:27Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection [31.58403386994297]
We propose MonoNeRD, a novel detection framework that can infer dense 3D geometry and occupancy.
Specifically, we model scenes with Signed Distance Functions (SDF), facilitating the production of dense 3D representations.
To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception.
arXiv Detail & Related papers (2023-08-18T09:39:52Z) - 3D Scene Creation and Rendering via Rough Meshes: A Lighting Transfer Avenue [49.62477229140788]
This paper studies how to flexibly integrate reconstructed 3D models into practical 3D modeling pipelines such as 3D scene creation and rendering.
We propose a lighting transfer network (LighTNet) to bridge NFR and PBR, such that they can benefit from each other.
arXiv Detail & Related papers (2022-11-27T13:31:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.