TexPainter: Generative Mesh Texturing with Multi-view Consistency
- URL: http://arxiv.org/abs/2406.18539v1
- Date: Fri, 17 May 2024 18:41:36 GMT
- Title: TexPainter: Generative Mesh Texturing with Multi-view Consistency
- Authors: Hongkun Zhang, Zherong Pan, Congyi Zhang, Lifeng Zhu, Xifeng Gao,
- Abstract summary: In this paper, we propose a novel method to enforce multi-view consistency.
We use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation.
Our method improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts.
- Score: 20.366302413005734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent success of pre-trained diffusion models unlocks the possibility of the automatic generation of textures for arbitrary 3D meshes in the wild. However, these models are trained in the screen space, while converting them to a multi-view consistent texture image poses a major obstacle to the output quality. In this paper, we propose a novel method to enforce multi-view consistency. Our method is based on the observation that latent space in a pre-trained diffusion model is noised separately for each camera view, making it difficult to achieve multi-view consistency by directly manipulating the latent codes. Based on the celebrated Denoising Diffusion Implicit Models (DDIM) scheme, we propose to use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation. Our method further relaxes the sequential dependency assumption among the camera views. By evaluating on a series of general 3D models, we find our simple approach improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts. Our implementation is available at: https://github.com/Quantuman134/TexPainter
Related papers
- Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models [54.35214051961381]
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use in movies, games, AR, and VR.
However, creating temporal consistent and realistic textures for mesh remains labor-intensive for professional artists.
We present 3D Tex sequences that integrates inherent geometry from mesh sequences with video diffusion models to produce consistent textures.
arXiv Detail & Related papers (2024-10-14T17:59:59Z) - GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation [35.04723374116026]
Large-scale text-to-image (T2I) models have shown astonishing results in text-to-image (T2I) generation.
Applying these models to synthesize textures for 3D geometries remains challenging due to the domain gap between 2D images and textures on a 3D surface.
We propose a novel text-to-texture synthesis framework that leverages pretrained diffusion models.
arXiv Detail & Related papers (2024-09-27T02:32:42Z) - Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model [15.936267489962122]
We propose a novel method for object insertion in 3D content represented by Gaussian Splatting.
Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model.
Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation.
arXiv Detail & Related papers (2024-09-25T13:52:50Z) - Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE [22.072200443502457]
We propose Scene123, a 3D scene generation model that ensures realism and diversity through the video generation framework.
Specifically, we warp the input image (or an image generated from text) to simulate adjacent views, filling the invisible areas with the MAE model.
To further enhance the details and texture fidelity of generated views, we employ a GAN-based Loss against images derived from the input image through the video generation model.
arXiv Detail & Related papers (2024-08-10T08:09:57Z) - PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model [34.35449902855767]
Two fundamental questions are what data we use for training and how to ensure multi-view consistency.
We propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models.
Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T17:48:15Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion
Models [77.85129451435704]
We present a new method to synthesize textures for 3D, using large-scale-guided image diffusion models.
Specifically, we leverage latent diffusion models, apply the set denoising model and aggregate denoising text map.
arXiv Detail & Related papers (2023-10-20T19:15:29Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - Learning 3D Photography Videos via Self-supervised Diffusion on Single
Images [105.81348348510551]
3D photography renders a static image into a video with appealing 3D visual effects.
Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints.
We present a novel task: out-animation, which extends the space and time of input objects.
arXiv Detail & Related papers (2023-02-21T16:18:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.