DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance
- URL: http://arxiv.org/abs/2404.14676v2
- Date: Mon, 1 Jul 2024 14:43:15 GMT
- Title: DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance
- Authors: Linxuan Xin, Zheng Zhang, Jinfu Wei, Wei Gao, Duan Gao,
- Abstract summary: We propose a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls.
Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs.
We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
- Score: 9.214785726215942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples. We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding. Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
Related papers
- GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.
Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.
The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process.
In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z) - Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control [1.8692054990918074]
Collaborative Control directly models PBR image probability distributions, including normal bump maps.
We discuss the design decisions involved in making this model multi-view consistent, and demonstrate the effectiveness of our approach in ablation studies.
arXiv Detail & Related papers (2024-10-09T15:21:46Z) - 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion [86.25111098482537]
We introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome limitations of existing methods.
3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format.
On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion.
We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-
arXiv Detail & Related papers (2024-09-19T17:59:06Z) - StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning [2.037819652873519]
We introduce StableMaterials, a novel approach for generating photorealistic physical-based rendering (PBR) materials.
Our method employs adversarial training to distill knowledge from existing large-scale image generation models.
We propose a new tileability technique that removes visual artifacts typically associated with fewer diffusion steps.
arXiv Detail & Related papers (2024-06-13T16:29:46Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - ReflectanceFusion: Diffusion-based text to SVBRDF Generation [12.5036873986483]
We introduce Reflectance Diffusion, a new neural text-to-texture model capable of generating high-fidelity SVBRDF maps from textual descriptions.
Our method leverages a tandem neural approach, consisting of two modules, to accurately model the distribution of spatially varying reflectance.
arXiv Detail & Related papers (2024-04-25T15:43:33Z) - MAP-Elites with Transverse Assessment for Multimodal Problems in
Creative Domains [2.7869568828212175]
We propose a novel approach to handle multimodal creative tasks using Quality Diversity evolution.
Our contribution is a variation of the MAP-Elites algorithm, MAP-Elites with Transverse Assessment (MEliTA)
MEliTA decouples the artefacts' modalities and promotes cross-pollination between elites.
arXiv Detail & Related papers (2024-03-11T21:50:22Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation [101.2317840114147]
We present UniDream, a text-to-3D generation framework by incorporating unified diffusion priors.
Our approach consists of three main components: (1) a dual-phase training process to get albedo-normal aligned multi-view diffusion and reconstruction models, (2) a progressive generation procedure for geometry and albedo-textures based on Score Distillation Sample (SDS) using the trained reconstruction and diffusion models, and (3) an innovative application of SDS for finalizing PBR generation while keeping a fixed albedo based on Stable Diffusion model.
arXiv Detail & Related papers (2023-12-14T09:07:37Z) - MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR [29.96046140529936]
We propose Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR (textbfMATLABER)
We train this auto-encoder with large-scale real-world BRDF collections and ensure the smoothness of its latent space.
Our approach demonstrates the superiority over existing ones in generating realistic and coherent object materials.
arXiv Detail & Related papers (2023-08-18T03:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.