Related papers: VideoNeuMat: Neural Material Extraction from Generative Video Models

VideoNeuMat: Neural Material Extraction from Generative Video Models

URL: http://arxiv.org/abs/2602.07272v1
Date: Fri, 06 Feb 2026 23:49:10 GMT
Title: VideoNeuMat: Neural Material Extraction from Generative Video Models
Authors: Bowen Xue, Saeed Hadadan, Zheng Zeng, Fabrice Rousselle, Zahra Montazeri, Milos Hasan,
Abstract summary: We present VideoNeuMat, a two-stage pipeline that extracts reusable neural material assets from video diffusion models.<n>First, we finetune a large video model to generate material sample videos under controlled camera and lighting trajectories.<n>Second, we reconstruct compact neural materials from these videos through a Large Reconstruction Model (LRM) finetuned from a smaller Wan 1.3B video backbone.
Score: 8.300347514555337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Creating photorealistic materials for 3D rendering requires exceptional artistic skill. Generative models for materials could help, but are currently limited by the lack of high-quality training data. While recent video generative models effortlessly produce realistic material appearances, this knowledge remains entangled with geometry and lighting. We present VideoNeuMat, a two-stage pipeline that extracts reusable neural material assets from video diffusion models. First, we finetune a large video model (Wan 2.1 14B) to generate material sample videos under controlled camera and lighting trajectories, effectively creating a "virtual gonioreflectometer" that preserves the model's material realism while learning a structured measurement pattern. Second, we reconstruct compact neural materials from these videos through a Large Reconstruction Model (LRM) finetuned from a smaller Wan 1.3B video backbone. From 17 generated video frames, our LRM performs single-pass inference to predict neural material parameters that generalize to novel viewing and lighting conditions. The resulting materials exhibit realism and diversity far exceeding the limited synthetic training data, demonstrating that material knowledge can be successfully transferred from internet-scale video models into standalone, reusable neural 3D assets.

Related papers

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation [48.986972061812004]
Video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently.<n>We extend a latent video diffusion model to output spatially varying PBR parameters and surface normals jointly with each generated view based on explicit camera control.<n>This unique setup allows for relighting and generating a 3D asset using our model as neural prior.
arXiv Detail & Related papers (2025-10-09T14:29:47Z)
Large Material Gaussian Model for Relightable 3D Generation [54.10879517395551]
We introduce a novel framework designed to generate high-quality 3D content with Physically Based Rendering (PBR) materials.<n>Our method not only exhibit greater visual appeal compared to baseline methods but also enhance material modeling, thereby enabling practical downstream rendering applications.
arXiv Detail & Related papers (2025-09-26T09:35:12Z)
RealMat: Realistic Materials with Diffusion and Reinforcement Learning [15.780720815063262]
We propose RealMat, a diffusion-based material generator that leverages realistic priors.<n>We first finetune a pretrained Stable Diffusion XL (SDXL) with synthetic material maps arranged in $2 times 2$ grids.<n>We propose to further finetune our model through reinforcement learning (RL), encouraging the generation of realistic materials.
arXiv Detail & Related papers (2025-09-01T05:04:51Z)
VideoMat: Extracting PBR Materials from Video Diffusion Models [11.48114859355725]
We leverage finetuned video diffusion models, intrinsic decomposition of videos, and physically-based differentiable rendering to generate high quality materials for 3D models given a text prompt or a single image.
arXiv Detail & Related papers (2025-06-11T12:36:49Z)
UVRM: A Scalable 3D Reconstruction Model from Unposed Videos [68.34221167200259]
Training 3D reconstruction models with 2D visual data traditionally requires prior knowledge of camera poses for the training samples.<n>We introduce UVRM, a novel 3D reconstruction model capable of being trained and evaluated on monocular videos without requiring any information about the pose.
arXiv Detail & Related papers (2025-01-16T08:00:17Z)
GenLit: Reformulating Single-Image Relighting as Video Generation [42.0880277180892]
We introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model.<n>We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes.
arXiv Detail & Related papers (2024-12-15T15:40:40Z)
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models [53.89348957053395]
We introduce a novel pipeline designed for text-to-4D scene generation. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video.
arXiv Detail & Related papers (2024-06-11T17:19:26Z)
VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities. We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models. Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z)
LRM: Large Reconstruction Model for Single Image to 3D [61.47357798633123]
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image. We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects.
arXiv Detail & Related papers (2023-11-08T00:03:52Z)
Latent Video Diffusion Models for High-Fidelity Long Video Generation [58.346702410885236]
We introduce lightweight video diffusion models using a low-dimensional 3D latent space. We also propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced. Our framework generates more realistic and longer videos than previous strong baselines.
arXiv Detail & Related papers (2022-11-23T18:58:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.