Related papers: Text-Guided Texturing by Synchronized Multi-View Diffusion

Text-Guided Texturing by Synchronized Multi-View Diffusion

URL: http://arxiv.org/abs/2311.12891v1
Date: Tue, 21 Nov 2023 06:26:28 GMT
Title: Text-Guided Texturing by Synchronized Multi-View Diffusion
Authors: Yuxin Liu, Minshan Xie, Hanyuan Liu, Tien-Tsin Wong
Abstract summary: This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt. We propose a synchronized multi-view diffusion approach that allows the diffusion processes from different views to reach a consensus. Our method demonstrates superior performance in generating consistent, seamless, highly detailed textures.
Score: 20.288858368568544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt. Based on the pretrained text-to-image (T2I) diffusion model, existing methods usually employ a project-and-inpaint approach, in which a view of the given object is first generated and warped to another view for inpainting. But it tends to generate inconsistent texture due to the asynchronous diffusion of multiple views. We believe such asynchronous diffusion and insufficient information sharing among views are the root causes of the inconsistent artifact. In this paper, we propose a synchronized multi-view diffusion approach that allows the diffusion processes from different views to reach a consensus of the generated content early in the process, and hence ensures the texture consistency. To synchronize the diffusion, we share the denoised content among different views in each denoising step, specifically blending the latent content in the texture domain from views with overlap. Our method demonstrates superior performance in generating consistent, seamless, highly detailed textures, comparing to state-of-the-art methods.

Related papers

FlexPainter: Flexible and Multi-View Consistent Texture Generation [15.727635740684157]
textbfFlexPainter is a novel texture generation pipeline that enables flexible multi-modal conditional guidance.<n>Our framework significantly outperforms state-of-the-art methods in both flexibility and generation quality.
arXiv Detail & Related papers (2025-06-03T08:36:03Z)
DoubleDiffusion: Combining Heat Diffusion with Denoising Diffusion for Texture Generation on 3D Meshes [67.39455433337316]
We propose a novel approach that directly generates texture on 3D meshes. By integrating this technique into a generative diffusion pipeline, we significantly improve the efficiency of texture generation.
arXiv Detail & Related papers (2025-01-06T21:34:52Z)
GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation [35.04723374116026]
Large-scale text-to-image (T2I) models have shown astonishing results in text-to-image (T2I) generation. Applying these models to synthesize textures for 3D geometries remains challenging due to the domain gap between 2D images and textures on a 3D surface. We propose a novel text-to-texture synthesis framework that leverages pretrained diffusion models.
arXiv Detail & Related papers (2024-09-27T02:32:42Z)
Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models. We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z)
TexPainter: Generative Mesh Texturing with Multi-view Consistency [20.366302413005734]
In this paper, we propose a novel method to enforce multi-view consistency. We use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation. Our method improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts.
arXiv Detail & Related papers (2024-05-17T18:41:36Z)
Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis [61.189479577198846]
We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU.
arXiv Detail & Related papers (2024-05-13T21:53:09Z)
GenesisTex: Adapting Image Denoising Diffusion to Texture Space [15.907134430301133]
GenesisTex is a novel method for synthesizing textures for 3D geometries from text descriptions. We maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint. Global consistency is achieved through the integration of style consistency mechanisms within the noise prediction network.
arXiv Detail & Related papers (2024-03-26T15:15:15Z)
Contextualized Diffusion Models for Text-Guided Image and Video Generation [67.69171154637172]
Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. We propose a novel and general contextualized diffusion model (ContextDiff) by incorporating the cross-modal context encompassing interactions and alignments between text condition and visual sample. We generalize our model to both DDPMs and DDIMs with theoretical derivations, and demonstrate the effectiveness of our model in evaluations with two challenging tasks: text-to-image generation, and text-to-video editing.
arXiv Detail & Related papers (2024-02-26T15:01:16Z)
TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models [77.85129451435704]
We present a new method to synthesize textures for 3D, using large-scale-guided image diffusion models. Specifically, we leverage latent diffusion models, apply the set denoising model and aggregate denoising text map.
arXiv Detail & Related papers (2023-10-20T19:15:29Z)
MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask [84.84034179136458]
A crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning. We propose an adaptive mask, which is conditioned on the attention maps and the prompt embeddings, to dynamically adjust the contribution of each text token to the image features. Our method, termed MaskDiffusion, is training-free and hot-pluggable for popular pre-trained diffusion models.
arXiv Detail & Related papers (2023-09-08T15:53:37Z)
Mixture of Diffusers for scene composition and high resolution image generation [0.0]
Mixture of diffusers is an algorithm that builds over existing diffusion models to provide a more detailed control over composition. By harmonizing several diffusion processes acting on different regions of a canvas, it allows generating larger images, where the location of each object and style is controlled by a separate diffusion process.
arXiv Detail & Related papers (2023-02-05T15:49:26Z)
eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. We train an ensemble of text-to-image diffusion models specialized for different stages synthesis. Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.