Related papers: MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds

MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds

URL: http://arxiv.org/abs/2512.21003v2
Date: Sun, 28 Dec 2025 15:36:11 GMT
Title: MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds
Authors: Xiangzuo Wu, Chengwei Ren, Jun Zhou, Xiu Li, Yuan Liu,
Abstract summary: Multi-view inverse rendering aims to recover geometry, materials, and illumination consistently across multiple viewpoints.<n>We introduce a feed-forward multi-view inverse rendering framework that directly predicts spatially varying albedo, metallic, roughness, diffuse shading, and surface normals from sequences of RGB images.<n>Our method achieves state-of-the-art performance in terms of multi-view consistency, material and normal estimation quality, and generalization to real-world imagery.
Score: 19.94963757122156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-view inverse rendering aims to recover geometry, materials, and illumination consistently across multiple viewpoints. When applied to multi-view images, existing single-view approaches often ignore cross-view relationships, leading to inconsistent results. In contrast, multi-view optimization methods rely on slow differentiable rendering and per-scene refinement, making them computationally expensive and hard to scale. To address these limitations, we introduce a feed-forward multi-view inverse rendering framework that directly predicts spatially varying albedo, metallic, roughness, diffuse shading, and surface normals from sequences of RGB images. By alternating attention across views, our model captures both intra-view long-range lighting interactions and inter-view material consistency, enabling coherent scene-level reasoning within a single forward pass. Due to the scarcity of real-world training data, models trained on existing synthetic datasets often struggle to generalize to real-world scenes. To overcome this limitation, we propose a consistency-based finetuning strategy that leverages unlabeled real-world videos to enhance both multi-view coherence and robustness under in-the-wild conditions. Extensive experiments on benchmark datasets demonstrate that our method achieves state-of-the-art performance in terms of multi-view consistency, material and normal estimation quality, and generalization to real-world imagery. Project page: https://maddog241.github.io/mvinverse-page/

Related papers

FROMAT: Multiview Material Appearance Transfer via Few-Shot Self-Attention Adaptation [49.74776147964999]
We present a lightweight adaptation technique for appearance transfer in multiview diffusion models.<n>Our method learns to combine object identity from an input image with appearance cues rendered in a separate reference image, producing multi-view-consistent output.
arXiv Detail & Related papers (2025-12-10T13:06:40Z)
MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference [83.38607296779423]
We show that multi-view consistent material inference with more physically-based environment modeling is key to learning accurate reflections with Gaussian Splatting.<n>Our method faithfully recovers both illumination and geometry, achieving state-of-the-art rendering quality in novel views synthesis.
arXiv Detail & Related papers (2025-10-13T13:29:20Z)
CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction [30.088316989385106]
Camera pipelines apply extensive on-device processing, such as exposure adjustment, white balance, and color correction.<n>Appearance variations violate multi-view consistency and degrade novel view synthesis.<n>We propose a generalizable, feed-forward approach that predicts spatially adaptive bilateral grids to correct photometric variations in a multi-view consistent manner.
arXiv Detail & Related papers (2025-07-21T16:03:58Z)
Auto-Regressively Generating Multi-View Consistent Images [10.513203377236744]
We propose the Multi-View Auto-Regressive (textbfMV-AR) method to generate consistent multi-view images from arbitrary prompts.<n>When generating widely-separated views, MV-AR can utilize all its preceding views to extract effective reference information.<n>Experiments demonstrate the performance and versatility of our MV-AR, which consistently generates consistent multi-view images.
arXiv Detail & Related papers (2025-06-23T11:28:37Z)
MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation [19.46962637673285]
MV-CoLight is a framework for illumination-consistent object compositing in 2D and 3D scenes.<n>We employ a Hilbert curve-based mapping to align 2D image inputs with 3D Gaussian scene representations seamlessly.<n> Experiments demonstrate state-of-the-art harmonized results across standard benchmarks and our dataset.
arXiv Detail & Related papers (2025-05-27T17:53:02Z)
Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting [4.89907242398523]
We propose renderability field-guided gaussian splatting (RF-GS) for scene view synthesis.<n>RF-GS quantifies input inhomogeneity through a renderability field, guiding pseudo-view sampling to enhanced visual consistency.<n>Our experiments on simulated and real-world data show that our method outperforms existing approaches in rendering stability.
arXiv Detail & Related papers (2025-04-27T14:41:01Z)
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.<n>Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.<n>We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z)
MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation [17.133440382384578]
We propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting.<n>A novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced to improve the quality of scene-level inverse rendering.
arXiv Detail & Related papers (2024-08-13T08:04:23Z)
MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image. Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z)
GenS: Generalizable Neural Surface Reconstruction from Multi-View Images [20.184657468900852]
GenS is an end-to-end generalizable neural surface reconstruction model. Our representation is more powerful, which can recover high-frequency details while maintaining global smoothness. Experiments on popular benchmarks show that our model can generalize well to new scenes.
arXiv Detail & Related papers (2024-06-04T17:13:10Z)
Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method. We embed multi-scale complementary features from the same view position into a set of nodes. By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z)
IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.