Text-Guided Texturing by Synchronized Multi-View Diffusion
- URL: http://arxiv.org/abs/2311.12891v1
- Date: Tue, 21 Nov 2023 06:26:28 GMT
- Title: Text-Guided Texturing by Synchronized Multi-View Diffusion
- Authors: Yuxin Liu, Minshan Xie, Hanyuan Liu, Tien-Tsin Wong
- Abstract summary: This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt.
We propose a synchronized multi-view diffusion approach that allows the diffusion processes from different views to reach a consensus.
Our method demonstrates superior performance in generating consistent, seamless, highly detailed textures.
- Score: 20.288858368568544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a novel approach to synthesize texture to dress up a
given 3D object, given a text prompt. Based on the pretrained text-to-image
(T2I) diffusion model, existing methods usually employ a project-and-inpaint
approach, in which a view of the given object is first generated and warped to
another view for inpainting. But it tends to generate inconsistent texture due
to the asynchronous diffusion of multiple views. We believe such asynchronous
diffusion and insufficient information sharing among views are the root causes
of the inconsistent artifact. In this paper, we propose a synchronized
multi-view diffusion approach that allows the diffusion processes from
different views to reach a consensus of the generated content early in the
process, and hence ensures the texture consistency. To synchronize the
diffusion, we share the denoised content among different views in each
denoising step, specifically blending the latent content in the texture domain
from views with overlap. Our method demonstrates superior performance in
generating consistent, seamless, highly detailed textures, comparing to
state-of-the-art methods.
Related papers
- Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models.
We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z) - TexPainter: Generative Mesh Texturing with Multi-view Consistency [20.366302413005734]
In this paper, we propose a novel method to enforce multi-view consistency.
We use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation.
Our method improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts.
arXiv Detail & Related papers (2024-05-17T18:41:36Z) - Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis [61.189479577198846]
We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt.
Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model.
At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU.
arXiv Detail & Related papers (2024-05-13T21:53:09Z) - GenesisTex: Adapting Image Denoising Diffusion to Texture Space [15.907134430301133]
GenesisTex is a novel method for synthesizing textures for 3D geometries from text descriptions.
We maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint.
Global consistency is achieved through the integration of style consistency mechanisms within the noise prediction network.
arXiv Detail & Related papers (2024-03-26T15:15:15Z) - Contextualized Diffusion Models for Text-Guided Image and Video Generation [67.69171154637172]
Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing.
We propose a novel and general contextualized diffusion model (ContextDiff) by incorporating the cross-modal context encompassing interactions and alignments between text condition and visual sample.
We generalize our model to both DDPMs and DDIMs with theoretical derivations, and demonstrate the effectiveness of our model in evaluations with two challenging tasks: text-to-image generation, and text-to-video editing.
arXiv Detail & Related papers (2024-02-26T15:01:16Z) - TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion
Models [77.85129451435704]
We present a new method to synthesize textures for 3D, using large-scale-guided image diffusion models.
Specifically, we leverage latent diffusion models, apply the set denoising model and aggregate denoising text map.
arXiv Detail & Related papers (2023-10-20T19:15:29Z) - MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask [84.84034179136458]
A crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning.
We propose an adaptive mask, which is conditioned on the attention maps and the prompt embeddings, to dynamically adjust the contribution of each text token to the image features.
Our method, termed MaskDiffusion, is training-free and hot-pluggable for popular pre-trained diffusion models.
arXiv Detail & Related papers (2023-09-08T15:53:37Z) - Text2Tex: Text-driven Texture Synthesis via Diffusion Models [31.773823357617093]
We present Text2Tex, a novel method for generating high-quality textures for 3D meshes from text prompts.
Our method incorporates inpainting into a pre-trained depth-aware image diffusion model to progressively synthesize high resolution partial textures from multiple viewpoints.
arXiv Detail & Related papers (2023-03-20T19:02:13Z) - Mixture of Diffusers for scene composition and high resolution image
generation [0.0]
Mixture of diffusers is an algorithm that builds over existing diffusion models to provide a more detailed control over composition.
By harmonizing several diffusion processes acting on different regions of a canvas, it allows generating larger images, where the location of each object and style is controlled by a separate diffusion process.
arXiv Detail & Related papers (2023-02-05T15:49:26Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.