DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance
- URL: http://arxiv.org/abs/2404.14676v2
- Date: Mon, 1 Jul 2024 14:43:15 GMT
- Title: DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance
- Authors: Linxuan Xin, Zheng Zhang, Jinfu Wei, Wei Gao, Duan Gao,
- Abstract summary: We propose a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls.
Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs.
We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
- Score: 9.214785726215942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples. We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding. Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
Related papers
- StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning [2.037819652873519]
We introduce StableMaterials, a novel approach for generating photorealistic physical-based rendering (PBR) materials.
Our method employs adversarial training to distill knowledge from existing large-scale image generation models.
We propose a new tileability technique that removes visual artifacts typically associated with fewer diffusion steps.
arXiv Detail & Related papers (2024-06-13T16:29:46Z) - Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields [50.12118098874321]
We introduce a latent 3D diffusion process for neural voxel fields, enabling generation at significantly higher resolutions.
A part-aware shape decoder is introduced to integrate the part codes into the neural voxel fields, guiding the accurate part decomposition.
The results demonstrate the superior generative capabilities of our proposed method in part-aware shape generation, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-05-02T04:31:17Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - ReflectanceFusion: Diffusion-based text to SVBRDF Generation [12.5036873986483]
We introduce Reflectance Diffusion, a new neural text-to-texture model capable of generating high-fidelity SVBRDF maps from textual descriptions.
Our method leverages a tandem neural approach, consisting of two modules, to accurately model the distribution of spatially varying reflectance.
arXiv Detail & Related papers (2024-04-25T15:43:33Z) - MAP-Elites with Transverse Assessment for Multimodal Problems in
Creative Domains [2.7869568828212175]
We propose a novel approach to handle multimodal creative tasks using Quality Diversity evolution.
Our contribution is a variation of the MAP-Elites algorithm, MAP-Elites with Transverse Assessment (MEliTA)
MEliTA decouples the artefacts' modalities and promotes cross-pollination between elites.
arXiv Detail & Related papers (2024-03-11T21:50:22Z) - Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception [63.03288425612792]
We propose bfAnyRef, a general MLLM model that can generate pixel-wise object perceptions and natural language descriptions from multi-modality references.
Our model achieves state-of-the-art results across multiple benchmarks, including diverse modality referring segmentation and region-level referring expression generation.
arXiv Detail & Related papers (2024-03-05T13:45:46Z) - UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation [101.2317840114147]
We present UniDream, a text-to-3D generation framework by incorporating unified diffusion priors.
Our approach consists of three main components: (1) a dual-phase training process to get albedo-normal aligned multi-view diffusion and reconstruction models, (2) a progressive generation procedure for geometry and albedo-textures based on Score Distillation Sample (SDS) using the trained reconstruction and diffusion models, and (3) an innovative application of SDS for finalizing PBR generation while keeping a fixed albedo based on Stable Diffusion model.
arXiv Detail & Related papers (2023-12-14T09:07:37Z) - LLMGA: Multimodal Large Language Model based Generation Assistant [53.150283805515926]
We introduce a Multimodal Large Language Model-based Generation Assistant (LLMGA) to assist users in image generation and editing.
We train the MLLM to grasp the properties of image generation and editing, enabling it to generate detailed prompts.
Extensive results show that LLMGA has promising generation and editing capabilities and can enable more flexible and expansive applications.
arXiv Detail & Related papers (2023-11-27T13:37:26Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR [29.96046140529936]
We propose Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR (textbfMATLABER)
We train this auto-encoder with large-scale real-world BRDF collections and ensure the smoothness of its latent space.
Our approach demonstrates the superiority over existing ones in generating realistic and coherent object materials.
arXiv Detail & Related papers (2023-08-18T03:40:38Z) - MaterialGAN: Reflectance Capture using a Generative SVBRDF Model [33.578080406338266]
We present MaterialGAN, a deep generative convolutional network based on StyleGAN2.
We show that MaterialGAN can be used as a powerful material prior in an inverse rendering framework.
We demonstrate this framework on the task of reconstructing SVBRDFs from images captured under flash illumination using a hand-held mobile phone.
arXiv Detail & Related papers (2020-09-30T21:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.