IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering
in Indoor Scenes
- URL: http://arxiv.org/abs/2206.08423v1
- Date: Thu, 16 Jun 2022 19:50:55 GMT
- Title: IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering
in Indoor Scenes
- Authors: Rui Zhu, Zhengqin Li, Janarbek Matai, Fatih Porikli, Manmohan
Chandraker
- Abstract summary: We show how a dense vision transformer, IRISformer, excels at both single-task and multi-task reasoning required for inverse rendering.
Specifically, we propose a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness and lighting from a single image of an indoor scene.
Our evaluations on benchmark datasets demonstrate state-of-the-art results on each of the above tasks, enabling applications like object insertion and material editing in a single unconstrained real image.
- Score: 99.76677232870192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Indoor scenes exhibit significant appearance variations due to myriad
interactions between arbitrarily diverse object shapes, spatially-changing
materials, and complex lighting. Shadows, highlights, and inter-reflections
caused by visible and invisible light sources require reasoning about
long-range interactions for inverse rendering, which seeks to recover the
components of image formation, namely, shape, material, and lighting. In this
work, our intuition is that the long-range attention learned by transformer
architectures is ideally suited to solve longstanding challenges in
single-image inverse rendering. We demonstrate with a specific instantiation of
a dense vision transformer, IRISformer, that excels at both single-task and
multi-task reasoning required for inverse rendering. Specifically, we propose a
transformer architecture to simultaneously estimate depths, normals,
spatially-varying albedo, roughness and lighting from a single image of an
indoor scene. Our extensive evaluations on benchmark datasets demonstrate
state-of-the-art results on each of the above tasks, enabling applications like
object insertion and material editing in a single unconstrained real image,
with greater photorealism than prior works. Code and data are publicly released
at https://github.com/ViLab-UCSD/IRISformer.
Related papers
- Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering [56.68286440268329]
correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials.
We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process.
Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes.
arXiv Detail & Related papers (2024-08-19T05:15:45Z) - MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation [17.133440382384578]
We propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting.
A novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced to improve the quality of scene-level inverse rendering.
arXiv Detail & Related papers (2024-08-13T08:04:23Z) - SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild [76.21063993398451]
Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics.
We show that an implicit shape representation based on a multi-resolution hash encoding enables faster and robust shape reconstruction.
Our method is class-agnostic and works on in-the-wild image collections of objects to produce relightable 3D assets.
arXiv Detail & Related papers (2024-01-18T18:01:19Z) - Neural Fields meet Explicit Geometric Representation for Inverse
Rendering of Urban Scenes [62.769186261245416]
We present a novel inverse rendering framework for large urban scenes capable of jointly reconstructing the scene geometry, spatially-varying materials, and HDR lighting from a set of posed RGB images with optional depth.
Specifically, we use a neural field to account for the primary rays, and use an explicit mesh (reconstructed from the underlying neural field) for modeling secondary rays that produce higher-order lighting effects such as cast shadows.
arXiv Detail & Related papers (2023-04-06T17:51:54Z) - NDJIR: Neural Direct and Joint Inverse Rendering for Geometry, Lights,
and Materials of Real Object [5.665283675533071]
We propose neural direct and joint inverse rendering, NDJIR.
Our proposed method can decompose semantically well for real object in photogrammetric setting.
arXiv Detail & Related papers (2023-02-02T13:21:03Z) - Learning-based Inverse Rendering of Complex Indoor Scenes with
Differentiable Monte Carlo Raytracing [27.96634370355241]
This work presents an end-to-end, learning-based inverse rendering framework incorporating differentiable Monte Carlo raytracing with importance sampling.
The framework takes a single image as input to jointly recover the underlying geometry, spatially-varying lighting, and photorealistic materials.
arXiv Detail & Related papers (2022-11-06T03:34:26Z) - DIB-R++: Learning to Predict Lighting and Material with a Hybrid
Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables.
In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing.
Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z) - Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting [149.1673041605155]
We address the problem of jointly estimating albedo, normals, depth and 3D spatially-varying lighting from a single image.
Most existing methods formulate the task as image-to-image translation, ignoring the 3D properties of the scene.
We propose a unified, learning-based inverse framework that formulates 3D spatially-varying lighting.
arXiv Detail & Related papers (2021-09-13T15:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.