GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation
- URL: http://arxiv.org/abs/2409.18401v1
- Date: Fri, 27 Sep 2024 02:32:42 GMT
- Title: GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation
- Authors: Jiawei Lu, Yingpeng Zhang, Zengjun Zhao, He Wang, Kun Zhou, Tianjia Shao,
- Abstract summary: Large-scale text-to-image (T2I) models have shown astonishing results in text-to-image (T2I) generation.
Applying these models to synthesize textures for 3D geometries remains challenging due to the domain gap between 2D images and textures on a 3D surface.
We propose a novel text-to-texture synthesis framework that leverages pretrained diffusion models.
- Score: 35.04723374116026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale text-guided image diffusion models have shown astonishing results in text-to-image (T2I) generation. However, applying these models to synthesize textures for 3D geometries remains challenging due to the domain gap between 2D images and textures on a 3D surface. Early works that used a projecting-and-inpainting approach managed to preserve generation diversity but often resulted in noticeable artifacts and style inconsistencies. While recent methods have attempted to address these inconsistencies, they often introduce other issues, such as blurring, over-saturation, or over-smoothing. To overcome these challenges, we propose a novel text-to-texture synthesis framework that leverages pretrained diffusion models. We first introduce a local attention reweighing mechanism in the self-attention layers to guide the model in concentrating on spatial-correlated patches across different views, thereby enhancing local details while preserving cross-view consistency. Additionally, we propose a novel latent space merge pipeline, which further ensures consistency across different viewpoints without sacrificing too much diversity. Our method significantly outperforms existing state-of-the-art techniques regarding texture consistency and visual quality, while delivering results much faster than distillation-based methods. Importantly, our framework does not require additional training or fine-tuning, making it highly adaptable to a wide range of models available on public platforms.
Related papers
- RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models [3.714901836138171]
We propose a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh.
Our method leverages state-of-the-art 2D diffusion models, including SDXL and multiple ControlNets, to capture structural features and intricate details in the generated textures.
arXiv Detail & Related papers (2024-09-30T06:29:50Z) - TexPainter: Generative Mesh Texturing with Multi-view Consistency [20.366302413005734]
In this paper, we propose a novel method to enforce multi-view consistency.
We use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation.
Our method improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts.
arXiv Detail & Related papers (2024-05-17T18:41:36Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - GenesisTex: Adapting Image Denoising Diffusion to Texture Space [15.907134430301133]
GenesisTex is a novel method for synthesizing textures for 3D geometries from text descriptions.
We maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint.
Global consistency is achieved through the integration of style consistency mechanisms within the noise prediction network.
arXiv Detail & Related papers (2024-03-26T15:15:15Z) - TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion
Models [77.85129451435704]
We present a new method to synthesize textures for 3D, using large-scale-guided image diffusion models.
Specifically, we leverage latent diffusion models, apply the set denoising model and aggregate denoising text map.
arXiv Detail & Related papers (2023-10-20T19:15:29Z) - Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators.
Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields.
We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools.
Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - Text-guided High-definition Consistency Texture Model [0.0]
We present the High-definition Consistency Texture Model (HCTM), a novel method that can generate high-definition textures for 3D meshes according to the text prompts.
We achieve this by leveraging a pre-trained depth-to-image diffusion model to generate single viewpoint results based on the text prompt and a depth map.
Our proposed approach has demonstrated promising results in generating high-definition and consistent textures for 3D meshes.
arXiv Detail & Related papers (2023-05-10T05:09:05Z) - 3DGen: Triplane Latent Diffusion for Textured Mesh Generation [17.178939191534994]
A triplane VAE learns latent representations of textured meshes and a conditional diffusion model generates the triplane features.
For the first time this architecture allows conditional and unconditional generation of high quality textured or untextured 3D meshes.
It outperforms previous work substantially on image-conditioned and unconditional generation on mesh quality as well as texture generation.
arXiv Detail & Related papers (2023-03-09T16:18:14Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.