Related papers: DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

URL: http://arxiv.org/abs/2404.14676v2
Date: Mon, 1 Jul 2024 14:43:15 GMT
Title: DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance
Authors: Linxuan Xin, Zheng Zhang, Jinfu Wei, Wei Gao, Duan Gao,
Abstract summary: We propose a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
Score: 9.214785726215942
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples. We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding. Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.

Related papers

IntrinsiX: High-Quality PBR Generation using Image Priors [49.90007540430264]
We introduce IntrinsiX, a novel method that generates high-quality intrinsic images from text description. In contrast to existing text-to-image models whose outputs contain baked-in scene lighting, our approach predicts physically-based rendering (PBR) maps.
arXiv Detail & Related papers (2025-04-01T17:47:48Z)
Unified Multimodal Discrete Diffusion [78.48930545306654]
Multimodal generative models that can understand and generate across multiple modalities are dominated by autoregressive (AR) approaches. We explore discrete diffusion models as a unified generative formulation in the joint text and image domain. We present the first Unified Multimodal Discrete Diffusion (UniDisc) model which is capable of jointly understanding and generating text and images.
arXiv Detail & Related papers (2025-03-26T17:59:51Z)
MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing [35.58100830471395]
Current methods for 3D generation still fall short in rendering physically based on large channels. We propose MuMA, a method for 3D methods through Multi-channel Multi-view generation and Agentic post-processing.
arXiv Detail & Related papers (2025-03-24T09:06:33Z)
PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture [9.265778497001843]
We present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials. We leverage vision language models (VLM) to guide multi-view diffusion, precisely capturing the spatial distribution and inherent attributes of reflective-metalness material. Our reconstruction model reconstructs high-quality mesh with PBR materials.
arXiv Detail & Related papers (2025-03-14T13:11:19Z)
MaterialMVP: Illumination-Invariant Material Generation via Multi-view PBR Diffusion [37.596740171045845]
Physically-based rendering (PBR) has become a cornerstone in modern computer graphics, enabling realistic material representation and lighting interactions in 3D scenes. We present a novel end-to-end model for generating PBR textures from 3D meshes and image prompts, addressing key challenges in multi-view material synthesis.
arXiv Detail & Related papers (2025-03-13T11:57:30Z)
Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation [56.862552362223425]
This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts. The framework consists of 3D shape generation and texture generation. This report details the system architecture, experimental results, and potential future directions to improve and expand the framework.
arXiv Detail & Related papers (2025-02-20T04:22:30Z)
MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation [30.69364954074992]
UNet-based diffusion models to generate multi-view physically rendering PBR maps but struggle with multi-view inconsistency, some 3D methods directly generate UV maps, issues due to the 3D data. In the stage, we propose to generate PBR materials, where both the specially designed Transformer DiDi) model to generate PBR materials feature reference views.
arXiv Detail & Related papers (2024-12-18T18:45:35Z)
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z)
DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures. In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process. In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z)
Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control [1.8692054990918074]
Collaborative Control directly models PBR image probability distributions, including normal bump maps. We discuss the design decisions involved in making this model multi-view consistent, and demonstrate the effectiveness of our approach in ablation studies.
arXiv Detail & Related papers (2024-10-09T15:21:46Z)
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion [86.25111098482537]
We introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome limitations of existing methods. 3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format. On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion. We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-
arXiv Detail & Related papers (2024-09-19T17:59:06Z)
StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning [2.037819652873519]
We introduce StableMaterials, a novel approach for generating photorealistic physical-based rendering (PBR) materials. Our method employs adversarial training to distill knowledge from existing large-scale image generation models. We propose a new tileability technique that removes visual artifacts typically associated with fewer diffusion steps.
arXiv Detail & Related papers (2024-06-13T16:29:46Z)
Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts. We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline. We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z)
ReflectanceFusion: Diffusion-based text to SVBRDF Generation [12.5036873986483]
We introduce Reflectance Diffusion, a new neural text-to-texture model capable of generating high-fidelity SVBRDF maps from textual descriptions. Our method leverages a tandem neural approach, consisting of two modules, to accurately model the distribution of spatially varying reflectance.
arXiv Detail & Related papers (2024-04-25T15:43:33Z)
MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains [2.7869568828212175]
We propose a novel approach to handle multimodal creative tasks using Quality Diversity evolution. Our contribution is a variation of the MAP-Elites algorithm, MAP-Elites with Transverse Assessment (MEliTA) MEliTA decouples the artefacts' modalities and promotes cross-pollination between elites.
arXiv Detail & Related papers (2024-03-11T21:50:22Z)
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation. A lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z)
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation [101.2317840114147]
We present UniDream, a text-to-3D generation framework by incorporating unified diffusion priors. Our approach consists of three main components: (1) a dual-phase training process to get albedo-normal aligned multi-view diffusion and reconstruction models, (2) a progressive generation procedure for geometry and albedo-textures based on Score Distillation Sample (SDS) using the trained reconstruction and diffusion models, and (3) an innovative application of SDS for finalizing PBR generation while keeping a fixed albedo based on Stable Diffusion model.
arXiv Detail & Related papers (2023-12-14T09:07:37Z)
MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR [29.96046140529936]
We propose Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR (textbfMATLABER) We train this auto-encoder with large-scale real-world BRDF collections and ensure the smoothness of its latent space. Our approach demonstrates the superiority over existing ones in generating realistic and coherent object materials.
arXiv Detail & Related papers (2023-08-18T03:40:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.