Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control
- URL: http://arxiv.org/abs/2410.06985v1
- Date: Wed, 9 Oct 2024 15:21:46 GMT
- Title: Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control
- Authors: Shimon Vainer, Konstantin Kutsy, Dante De Nigris, Ciara Rowles, Slava Elizarov, Simon Donné,
- Abstract summary: Collaborative Control directly models PBR image probability distributions, including normal bump maps.
We discuss the design decisions involved in making this model multi-view consistent, and demonstrate the effectiveness of our approach in ablation studies.
- Score: 1.8692054990918074
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-view consistency remains a challenge for image diffusion models. Even within the Text-to-Texture problem, where perfect geometric correspondences are known a priori, many methods fail to yield aligned predictions across views, necessitating non-trivial fusion methods to incorporate the results onto the original mesh. We explore this issue for a Collaborative Control workflow specifically in PBR Text-to-Texture. Collaborative Control directly models PBR image probability distributions, including normal bump maps; to our knowledge, the only diffusion model to directly output full PBR stacks. We discuss the design decisions involved in making this model multi-view consistent, and demonstrate the effectiveness of our approach in ablation studies, as well as practical applications.
Related papers
- EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models [42.55874233756394]
We propose a novel approach termed Exemplar-guided Image Translation with Brownian-Bridge Diffusion Models (EBDM)
Our method formulates the task as a Brownian bridge process, a diffusion process with a fixed initial point as structure control and translates into the corresponding photo-realistic image while being conditioned solely on the given exemplar image.
arXiv Detail & Related papers (2024-10-13T11:10:34Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance [9.214785726215942]
We propose a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls.
Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs.
We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
arXiv Detail & Related papers (2024-04-23T02:04:53Z) - ViewFusion: Towards Multi-View Consistency via Interpolated Denoising [48.02829400913904]
We introduce ViewFusion, a training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models.
Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for the next view generation.
Our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning.
arXiv Detail & Related papers (2024-02-29T04:21:38Z) - CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image
Diffusion Models [48.10798436003449]
Images produced by text-to-image diffusion models might not always faithfully represent the semantic intent of the provided text prompt.
Our work introduces a novel perspective by tackling this challenge in a contrastive context.
We conduct extensive experiments across a wide variety of scenarios, each involving unique combinations of objects, attributes, and scenes.
arXiv Detail & Related papers (2023-12-11T01:42:15Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - Grounded Text-to-Image Synthesis with Attention Refocusing [16.9170825951175]
We reveal the potential causes in the diffusion model's cross-attention and self-attention layers.
We propose two novel losses to refocus attention maps according to a given spatial layout during sampling.
We show that our proposed attention refocusing effectively improves the controllability of existing approaches.
arXiv Detail & Related papers (2023-06-08T17:59:59Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.