MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation
- URL: http://arxiv.org/abs/2408.06707v1
- Date: Tue, 13 Aug 2024 08:04:23 GMT
- Title: MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation
- Authors: JunYong Choi, SeokYeong Lee, Haesol Park, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho,
- Abstract summary: We propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting.
A novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced to improve the quality of scene-level inverse rendering.
- Score: 17.133440382384578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range multi-view images with ground-truth geometry, material, and spatially-varying lighting. To improve the quality of scene-level inverse rendering, a novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced. MAIR performs scene-level multi-view inverse rendering by expanding the OpenRooms dataset, designing efficient pipelines to handle multi-view images, and splitting spatially-varying lighting. Although MAIR showed impressive results, its lighting representation is fixed to spherical Gaussians, which limits its ability to render images realistically. Consequently, MAIR cannot be directly used in applications such as material editing. Moreover, its multi-view aggregation networks have difficulties extracting rich features because they only focus on the mean and variance between multi-view features. In this paper, we propose its extended version, called MAIR++. MAIR++ addresses the aforementioned limitations by introducing an implicit lighting representation that accurately captures the lighting conditions of an image while facilitating realistic rendering. Furthermore, we design a directional attention-based multi-view aggregation network to infer more intricate relationships between views. Experimental results show that MAIR++ not only achieves better performance than MAIR and single-view-based methods, but also displays robust performance on unseen real-world scenes.
Related papers
- View-consistent Object Removal in Radiance Fields [14.195400035176815]
Radiance Fields (RFs) have emerged as a crucial technology for 3D scene representation.
Current methods rely on per-frame 2D image inpainting, which often fails to maintain consistency across views.
We introduce a novel RF editing pipeline that significantly enhances consistency by requiring the inpainting of only a single reference image.
arXiv Detail & Related papers (2024-08-04T17:57:23Z) - MuRF: Multi-Baseline Radiance Fields [117.55811938988256]
We present Multi-Baseline Radiance Fields (MuRF), a feed-forward approach to solving sparse view synthesis.
MuRF achieves state-of-the-art performance across multiple different baseline settings.
We also show promising zero-shot generalization abilities on the Mip-NeRF 360 dataset.
arXiv Detail & Related papers (2023-12-07T18:59:56Z) - Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering [84.37776381343662]
Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information.
We propose mip voxel grids (Mip-VoG), an explicit multiscale representation for real-time anti-aliasing rendering.
Our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously.
arXiv Detail & Related papers (2023-04-20T04:05:22Z) - MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying
Lighting Estimation [13.325800282424598]
We propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, a SVBRDF, and 3D spatially-varying lighting.
Our experiments show that the proposed method achieves better performance than single-view-based methods, but also achieves robust performance on unseen real-world scene.
arXiv Detail & Related papers (2023-03-22T08:07:28Z) - Learning-based Inverse Rendering of Complex Indoor Scenes with
Differentiable Monte Carlo Raytracing [27.96634370355241]
This work presents an end-to-end, learning-based inverse rendering framework incorporating differentiable Monte Carlo raytracing with importance sampling.
The framework takes a single image as input to jointly recover the underlying geometry, spatially-varying lighting, and photorealistic materials.
arXiv Detail & Related papers (2022-11-06T03:34:26Z) - IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering
in Indoor Scenes [99.76677232870192]
We show how a dense vision transformer, IRISformer, excels at both single-task and multi-task reasoning required for inverse rendering.
Specifically, we propose a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness and lighting from a single image of an indoor scene.
Our evaluations on benchmark datasets demonstrate state-of-the-art results on each of the above tasks, enabling applications like object insertion and material editing in a single unconstrained real image.
arXiv Detail & Related papers (2022-06-16T19:50:55Z) - Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations.
We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z) - DIB-R++: Learning to Predict Lighting and Material with a Hybrid
Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables.
In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing.
Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z) - IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.
By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.