Consistent Mesh Diffusion
- URL: http://arxiv.org/abs/2312.00971v1
- Date: Fri, 1 Dec 2023 23:25:14 GMT
- Title: Consistent Mesh Diffusion
- Authors: Julian Knodt and Xifeng Gao
- Abstract summary: Given a 3D mesh with a UV parameterization, we introduce a novel approach to generating textures from text prompts.
We demonstrate our approach on a dataset containing 30 meshes, taking approximately 5 minutes per mesh.
- Score: 8.318075237885857
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Given a 3D mesh with a UV parameterization, we introduce a novel approach to
generating textures from text prompts. While prior work uses optimization from
Text-to-Image Diffusion models to generate textures and geometry, this is slow
and requires significant compute resources. Alternatively, there are projection
based approaches that use the same Text-to-Image models that paint images onto
a mesh, but lack consistency at different viewing angles, we propose a method
that uses a single Depth-to-Image diffusion network, and generates a single
consistent texture when rendered on the 3D surface by first unifying multiple
2D image's diffusion paths, and hoisting that to 3D with
MultiDiffusion~\cite{multidiffusion}. We demonstrate our approach on a dataset
containing 30 meshes, taking approximately 5 minutes per mesh. To evaluate the
quality of our approach, we use CLIP-score~\cite{clipscore} and Frechet
Inception Distance (FID)~\cite{frechet} to evaluate the quality of the
rendering, and show our improvement over prior work.
Related papers
- EASI-Tex: Edge-Aware Mesh Texturing from Single Image [12.942796503696194]
We present a novel approach for single-image, which employs a diffusion model with conditioning to seamlessly transfer an object's texture to a given 3D mesh object.
We do not assume that the two objects belong to the same category, and even if they do, can be discrepancies in their proportions and part proportions.
arXiv Detail & Related papers (2024-05-27T17:46:22Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
2D Diffusion Models [102.75875255071246]
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community.
We propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.
arXiv Detail & Related papers (2023-11-09T15:51:27Z) - Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images.
To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z) - TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion
Models [77.85129451435704]
We present a new method to synthesize textures for 3D, using large-scale-guided image diffusion models.
Specifically, we leverage latent diffusion models, apply the set denoising model and aggregate denoising text map.
arXiv Detail & Related papers (2023-10-20T19:15:29Z) - Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models [21.622420436349245]
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.
We leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses.
In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model.
arXiv Detail & Related papers (2023-03-21T16:21:02Z) - $PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D
Reconstruction [97.06927852165464]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision.
We propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process.
arXiv Detail & Related papers (2023-02-21T13:37:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.