ReflectanceFusion: Diffusion-based text to SVBRDF Generation
- URL: http://arxiv.org/abs/2406.14565v1
- Date: Thu, 25 Apr 2024 15:43:33 GMT
- Title: ReflectanceFusion: Diffusion-based text to SVBRDF Generation
- Authors: Bowen Xue, Giuseppe Claudio Guarnera, Shuang Zhao, Zahra Montazeri,
- Abstract summary: We introduce Reflectance Diffusion, a new neural text-to-texture model capable of generating high-fidelity SVBRDF maps from textual descriptions.
Our method leverages a tandem neural approach, consisting of two modules, to accurately model the distribution of spatially varying reflectance.
- Score: 12.5036873986483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Reflectance Diffusion, a new neural text-to-texture model capable of generating high-fidelity SVBRDF maps from textual descriptions. Our method leverages a tandem neural approach, consisting of two modules, to accurately model the distribution of spatially varying reflectance as described by text prompts. Initially, we employ a pre-trained stable diffusion 2 model to generate a latent representation that informs the overall shape of the material and serves as our backbone model. Then, our ReflectanceUNet enables fine-tuning control over the material's physical appearance and generates SVBRDF maps. ReflectanceUNet module is trained on an extensive dataset comprising approximately 200,000 synthetic spatially varying materials. Our generative SVBRDF diffusion model allows for the synthesis of multiple SVBRDF estimates from a single textual input, offering users the possibility to choose the output that best aligns with their requirements. We illustrate our method's versatility by generating SVBRDF maps from a range of textual descriptions, both specific and broad. Our ReflectanceUNet model can integrate optional physical parameters, such as roughness and specularity, enhancing customization. When the backbone module is fixed, the ReflectanceUNet module refines the material, allowing direct edits to its physical attributes. Comparative evaluations demonstrate that ReflectanceFusion achieves better accuracy than existing text-to-material models, such as Text2Mat, while also providing the benefits of editable and relightable SVBRDF maps.
Related papers
- Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models [54.052963634384945]
We introduce the Image Regeneration task to assess text-to-image models.
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model.
We also present ImageRepainter framework to enhance the quality of generated images.
arXiv Detail & Related papers (2024-11-14T13:52:43Z) - LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation [30.897935761304034]
We propose a novel framework called textbfLLM4GEN, which enhances the semantic understanding of text-to-image diffusion models.
A specially designed Cross-Adapter Module (CAM) integrates the original text features of text-to-image models with LLM features.
DensePrompts, which contains $7,000$ dense prompts, provides a comprehensive evaluation for the text-to-image generation task.
arXiv Detail & Related papers (2024-06-30T15:50:32Z) - Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models [42.891427362223176]
Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities.
We propose a novel framework to fully harness the capabilities of LLMs.
We further design an LLM-Infused Diffusion Transformer (LI-DiT) based on the framework.
arXiv Detail & Related papers (2024-06-17T17:59:43Z) - MatFusion: A Generative Diffusion Model for SVBRDF Capture [3.3090362820994526]
We formulate SVBRDF estimation from photographs as a diffusion task.
We first train a novel unconditional SVBRDF diffusion backbone model on a set of 312,165 synthetic spatially varying material exemplars.
We demonstrate the flexibility of our method by refining different SVBRDF diffusion models conditioned on different types of incident lighting.
arXiv Detail & Related papers (2024-04-24T02:07:53Z) - DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance [9.214785726215942]
We propose a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls.
Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs.
We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
arXiv Detail & Related papers (2024-04-23T02:04:53Z) - Contextualized Diffusion Models for Text-Guided Image and Video Generation [67.69171154637172]
Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing.
We propose a novel and general contextualized diffusion model (ContextDiff) by incorporating the cross-modal context encompassing interactions and alignments between text condition and visual sample.
We generalize our model to both DDPMs and DDIMs with theoretical derivations, and demonstrate the effectiveness of our model in evaluations with two challenging tasks: text-to-image generation, and text-to-video editing.
arXiv Detail & Related papers (2024-02-26T15:01:16Z) - Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [57.01994216693825]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable.
We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE.
Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Compositional Visual Generation with Composable Diffusion Models [80.75258849913574]
We propose an alternative structured approach for compositional generation using diffusion models.
An image is generated by composing a set of diffusion models, with each of them modeling a certain component of the image.
The proposed method can generate scenes at test time that are substantially more complex than those seen in training.
arXiv Detail & Related papers (2022-06-03T17:47:04Z) - Neural BRDF Representation and Importance Sampling [79.84316447473873]
We present a compact neural network-based representation of reflectance BRDF data.
We encode BRDFs as lightweight networks, and propose a training scheme with adaptive angular sampling.
We evaluate encoding results on isotropic and anisotropic BRDFs from multiple real-world datasets.
arXiv Detail & Related papers (2021-02-11T12:00:24Z) - MaterialGAN: Reflectance Capture using a Generative SVBRDF Model [33.578080406338266]
We present MaterialGAN, a deep generative convolutional network based on StyleGAN2.
We show that MaterialGAN can be used as a powerful material prior in an inverse rendering framework.
We demonstrate this framework on the task of reconstructing SVBRDFs from images captured under flash illumination using a hand-held mobile phone.
arXiv Detail & Related papers (2020-09-30T21:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.