MaterialPicker: Multi-Modal Material Generation with Diffusion Transformers
- URL: http://arxiv.org/abs/2412.03225v2
- Date: Fri, 06 Dec 2024 05:24:39 GMT
- Title: MaterialPicker: Multi-Modal Material Generation with Diffusion Transformers
- Authors: Xiaohe Ma, Valentin Deschaintre, Miloš Hašan, Fujun Luan, Kun Zhou, Hongzhi Wu, Yiwei Hu,
- Abstract summary: We propose a multi-modal material generator leveraging a Diffusion Transformer (DiT) architecture.
Our method can generate a material based on an image crop of a material sample, even if the captured surface is distorted.
We show that it enables more diverse material generation and better distortion correction than previous work.
- Score: 27.007661861644376
- License:
- Abstract: High-quality material generation is key for virtual environment authoring and inverse rendering. We propose MaterialPicker, a multi-modal material generator leveraging a Diffusion Transformer (DiT) architecture, improving and simplifying the creation of high-quality materials from text prompts and/or photographs. Our method can generate a material based on an image crop of a material sample, even if the captured surface is distorted, viewed at an angle or partially occluded, as is often the case in photographs of natural scenes. We further allow the user to specify a text prompt to provide additional guidance for the generation. We finetune a pre-trained DiT-based video generator into a material generator, where each material map is treated as a frame in a video sequence. We evaluate our approach both quantitatively and qualitatively and show that it enables more diverse material generation and better distortion correction than previous work.
Related papers
- Materialist: Physically Based Editing Using Single-Image Inverse Rendering [50.39048790589746]
We present a method combining a learning-based approach with progressive differentiable rendering.
Our method achieves more realistic light material interactions, accurate shadows, and global illumination.
We also propose a method for material transparency editing that operates effectively without requiring full scene geometry.
arXiv Detail & Related papers (2025-01-07T11:52:01Z) - TexPro: Text-guided PBR Texturing with Procedural Material Modeling [23.8905505397344]
TexPro is a novel method for high-fidelity material generation for input 3D meshes given text prompts.
We first generate multi-view reference images given the input textual prompt by employing the latest text-to-image model.
We derive texture maps through a rendering-based optimization with recent differentiable procedural materials.
arXiv Detail & Related papers (2024-10-21T11:10:07Z) - MaPa: Text-driven Photorealistic Material Painting for 3D Shapes [80.66880375862628]
This paper aims to generate materials for 3D meshes from text descriptions.
Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs.
Our framework supports high-quality rendering and provides substantial flexibility in editing.
arXiv Detail & Related papers (2024-04-26T17:54:38Z) - MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation [54.64194935409982]
We introduce MuLAn: a novel dataset comprising over 44K MUlti-Layer-wise RGBA decompositions.
MuLAn is the first photorealistic resource providing instance decomposition and spatial information for high quality images.
We aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions.
arXiv Detail & Related papers (2024-04-03T14:58:00Z) - GenDeF: Learning Generative Deformation Field for Video Generation [89.49567113452396]
We propose to render a video by warping one static image with a generative deformation field (GenDeF)
Such a pipeline enjoys three appealing advantages.
arXiv Detail & Related papers (2023-12-07T18:59:41Z) - Alchemist: Parametric Control of Material Properties with Diffusion
Models [51.63031820280475]
Our method capitalizes on the generative prior of text-to-image models known for photorealism.
We show the potential application of our model to material edited NeRFs.
arXiv Detail & Related papers (2023-12-05T18:58:26Z) - MatFuse: Controllable Material Generation with Diffusion Models [10.993516790237503]
MatFuse is a unified approach that harnesses the generative power of diffusion models for creation and editing of 3D materials.
Our method integrates multiple sources of conditioning, including color palettes, sketches, text, and pictures, enhancing creative possibilities.
We demonstrate the effectiveness of MatFuse under multiple conditioning settings and explore the potential of material editing.
arXiv Detail & Related papers (2023-08-22T12:54:48Z) - PhotoMat: A Material Generator Learned from Single Flash Photos [37.42765147463852]
Previous generative models for materials have been trained exclusively on synthetic data.
We propose PhotoMat: the first material generator trained exclusively on real photos of material samples captured using a cell phone camera with flash.
We show that our generated materials have better visual quality than previous material generators trained on synthetic data.
arXiv Detail & Related papers (2023-05-20T22:27:41Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - One-shot recognition of any material anywhere using contrastive learning
with physics-based rendering [0.0]
We present MatSim: a synthetic dataset, a benchmark, and a method for computer vision based recognition of similarities and transitions between materials and textures.
The visual recognition of materials is essential to everything from examining food while cooking to inspecting agriculture, chemistry, and industrial products.
arXiv Detail & Related papers (2022-12-01T16:49:53Z) - MaterialGAN: Reflectance Capture using a Generative SVBRDF Model [33.578080406338266]
We present MaterialGAN, a deep generative convolutional network based on StyleGAN2.
We show that MaterialGAN can be used as a powerful material prior in an inverse rendering framework.
We demonstrate this framework on the task of reconstructing SVBRDFs from images captured under flash illumination using a hand-held mobile phone.
arXiv Detail & Related papers (2020-09-30T21:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.