MD-ProjTex: Texturing 3D Shapes with Multi-Diffusion Projection
- URL: http://arxiv.org/abs/2504.02762v1
- Date: Thu, 03 Apr 2025 16:58:06 GMT
- Title: MD-ProjTex: Texturing 3D Shapes with Multi-Diffusion Projection
- Authors: Ahmet Burak Yildirim, Mustafa Utku Aydogdu, Duygu Ceylan, Aysegul Dundar,
- Abstract summary: We introduce MD-ProjTex, a method for fast and consistent text-guided texture generation for 3D shapes using pretrained text-to-image diffusion models.<n>At the core of our approach is a multi-view consistency mechanism in UV space, which ensures coherent textures across different viewpoints.
- Score: 15.115928324960407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce MD-ProjTex, a method for fast and consistent text-guided texture generation for 3D shapes using pretrained text-to-image diffusion models. At the core of our approach is a multi-view consistency mechanism in UV space, which ensures coherent textures across different viewpoints. Specifically, MD-ProjTex fuses noise predictions from multiple views at each diffusion step and jointly updates the per-view denoising directions to maintain 3D consistency. In contrast to existing state-of-the-art methods that rely on optimization or sequential view synthesis, MD-ProjTex is computationally more efficient and achieves better quantitative and qualitative results.
Related papers
- PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [39.03289977892935]
RealmDreamer is a technique for generating forward-facing 3D scenes from text descriptions.<n>We leverage 2D inpainting diffusion models conditioned on an initial scene estimate to provide low variance supervision for unknown regions during 3D distillation.<n> Notably, our technique doesn't require video or multi-view data and can synthesize various high-quality 3D scenes in different styles with complex layouts.
arXiv Detail & Related papers (2024-04-10T17:57:41Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - AToM: Amortized Text-to-Mesh using 2D Diffusion [107.02696990299032]
Amortized Text-to-Mesh (AToM) is a feed-forward framework optimized across multiple text prompts simultaneously.
AToM directly generates high-quality textured meshes in less than 1 second with around 10 times reduction in the training cost.
AToM significantly outperforms state-of-the-art amortized approaches with over 4 times higher accuracy.
arXiv Detail & Related papers (2024-02-01T18:59:56Z) - TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion
Models [77.85129451435704]
We present a new method to synthesize textures for 3D, using large-scale-guided image diffusion models.
Specifically, we leverage latent diffusion models, apply the set denoising model and aggregate denoising text map.
arXiv Detail & Related papers (2023-10-20T19:15:29Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion
Guidance [19.252300247300145]
This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation.
We compute denoising scores in the text-to-image diffusion model's latent and image spaces.
To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays.
arXiv Detail & Related papers (2023-05-30T05:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.