VideoMat: Extracting PBR Materials from Video Diffusion Models
- URL: http://arxiv.org/abs/2506.09665v2
- Date: Mon, 16 Jun 2025 12:02:05 GMT
- Title: VideoMat: Extracting PBR Materials from Video Diffusion Models
- Authors: Jacob Munkberg, Zian Wang, Ruofan Liang, Tianchang Shen, Jon Hasselgren,
- Abstract summary: We leverage finetuned video diffusion models, intrinsic decomposition of videos, and physically-based differentiable rendering to generate high quality materials for 3D models given a text prompt or a single image.
- Score: 11.48114859355725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We leverage finetuned video diffusion models, intrinsic decomposition of videos, and physically-based differentiable rendering to generate high quality materials for 3D models given a text prompt or a single image. We condition a video diffusion model to respect the input geometry and lighting condition. This model produces multiple views of a given 3D model with coherent material properties. Secondly, we use a recent model to extract intrinsics (base color, roughness, metallic) from the generated video. Finally, we use the intrinsics alongside the generated video in a differentiable path tracer to robustly extract PBR materials directly compatible with common content creation tools.
Related papers
- VideoNeuMat: Neural Material Extraction from Generative Video Models [8.300347514555337]
We present VideoNeuMat, a two-stage pipeline that extracts reusable neural material assets from video diffusion models.<n>First, we finetune a large video model to generate material sample videos under controlled camera and lighting trajectories.<n>Second, we reconstruct compact neural materials from these videos through a Large Reconstruction Model (LRM) finetuned from a smaller Wan 1.3B video backbone.
arXiv Detail & Related papers (2026-02-06T23:49:10Z) - View-Consistent Diffusion Representations for 3D-Consistent Video Generation [60.68052293389281]
Current generated videos still contain visual artifacts arising from 3D inconsistencies.<n>We propose ViCoDR, a new approach for improving the 3D consistency of video models by learning multi-view consistent diffusion representations.
arXiv Detail & Related papers (2025-11-24T11:16:55Z) - SViM3D: Stable Video Material Diffusion for Single Image 3D Generation [48.986972061812004]
Video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently.<n>We extend a latent video diffusion model to output spatially varying PBR parameters and surface normals jointly with each generated view based on explicit camera control.<n>This unique setup allows for relighting and generating a 3D asset using our model as neural prior.
arXiv Detail & Related papers (2025-10-09T14:29:47Z) - Large Material Gaussian Model for Relightable 3D Generation [54.10879517395551]
We introduce a novel framework designed to generate high-quality 3D content with Physically Based Rendering (PBR) materials.<n>Our method not only exhibit greater visual appeal compared to baseline methods but also enhance material modeling, thereby enabling practical downstream rendering applications.
arXiv Detail & Related papers (2025-09-26T09:35:12Z) - MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing [35.58100830471395]
Current methods for 3D generation still fall short in rendering physically based on large channels.<n>We propose MuMA, a method for 3D methods through Multi-channel Multi-view generation and Agentic post-processing.
arXiv Detail & Related papers (2025-03-24T09:06:33Z) - MaterialMVP: Illumination-Invariant Material Generation via Multi-view PBR Diffusion [37.596740171045845]
Physically-based rendering (PBR) has become a cornerstone in modern computer graphics, enabling realistic material representation and lighting interactions in 3D scenes.<n>We present a novel end-to-end model for generating PBR textures from 3D meshes and image prompts, addressing key challenges in multi-view material synthesis.
arXiv Detail & Related papers (2025-03-13T11:57:30Z) - MatCLIP: Light- and Shape-Insensitive Assignment of PBR Material Models [42.42328559042189]
MatCLIP is a novel method that extracts shape- and lighting-insensitive descriptors of PBR materials to assign plausible textures to 3D objects based on images.<n>By extending an Alpha-CLIP-based model on material renderings across diverse shapes and lighting, our approach generates descriptors that bridge the domains of PBR representations with photographs or renderings.<n>MatCLIP achieves a top-1 classification accuracy of 76.6%, outperforming state-of-the-art methods such as PhotoShape and MatAtlas.
arXiv Detail & Related papers (2025-01-27T12:08:52Z) - TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting [48.97819552366636]
This paper presents TexGaussian, a novel method that uses octant-aligned 3D Gaussian Splatting for rapid PBR material generation.<n>Our method synthesizes more visually pleasing PBR materials and runs faster than previous methods in both unconditional and text-conditional scenarios.
arXiv Detail & Related papers (2024-11-29T12:19:39Z) - Edify 3D: Scalable High-Quality 3D Asset Generation [53.86838858460809]
Edify 3D is an advanced solution designed for high-quality 3D asset generation.
Our method can generate high-quality 3D assets with detailed geometry, clean shape topologies, high-resolution textures, and materials within 2 minutes of runtime.
arXiv Detail & Related papers (2024-11-11T17:07:43Z) - Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models [54.35214051961381]
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use in movies, games, AR, and VR.<n>However, creating temporal consistent and realistic textures for mesh remains labor-intensive for professional artists.<n>We present 3D Tex sequences that integrates inherent geometry from mesh sequences with video diffusion models to produce consistent textures.
arXiv Detail & Related papers (2024-10-14T17:59:59Z) - MaPa: Text-driven Photorealistic Material Painting for 3D Shapes [79.13775179541311]
This paper aims to generate materials for 3D meshes from text descriptions.<n>Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs.<n>Our framework supports high-quality rendering and provides substantial flexibility in editing.
arXiv Detail & Related papers (2024-04-26T17:54:38Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z) - Latent Video Diffusion Models for High-Fidelity Long Video Generation [58.346702410885236]
We introduce lightweight video diffusion models using a low-dimensional 3D latent space.
We also propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced.
Our framework generates more realistic and longer videos than previous strong baselines.
arXiv Detail & Related papers (2022-11-23T18:58:39Z) - Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models.
We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.