Related papers: Multi-scale Attention-Guided Intrinsic Decomposition and Rendering Pass Prediction for Facial Images

Multi-scale Attention-Guided Intrinsic Decomposition and Rendering Pass Prediction for Facial Images

URL: http://arxiv.org/abs/2512.16511v1
Date: Thu, 18 Dec 2025 13:23:49 GMT
Title: Multi-scale Attention-Guided Intrinsic Decomposition and Rendering Pass Prediction for Facial Images
Authors: Hossein Javidnia,
Abstract summary: This paper introduces MAGINet, a Multi-scale Attention-Guided Intrinsics Network that predicts a light-normalized diffuse albedo map from a single RGB portrait.<n>The pipeline achieves state-of-the-art performance for diffuse albedo estimation and demonstrates significantly improved fidelity for the complete rendering stack compared to prior methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate intrinsic decomposition of face images under unconstrained lighting is a prerequisite for photorealistic relighting, high-fidelity digital doubles, and augmented-reality effects. This paper introduces MAGINet, a Multi-scale Attention-Guided Intrinsics Network that predicts a $512\times512$ light-normalized diffuse albedo map from a single RGB portrait. MAGINet employs hierarchical residual encoding, spatial-and-channel attention in a bottleneck, and adaptive multi-scale feature fusion in the decoder, yielding sharper albedo boundaries and stronger lighting invariance than prior U-Net variants. The initial albedo prediction is upsampled to $1024\times1024$ and refined by a lightweight three-layer CNN (RefinementNet). Conditioned on this refined albedo, a Pix2PixHD-based translator then predicts a comprehensive set of five additional physically based rendering passes: ambient occlusion, surface normal, specular reflectance, translucency, and raw diffuse colour (with residual lighting). Together with the refined albedo, these six passes form the complete intrinsic decomposition. Trained with a combination of masked-MSE, VGG, edge, and patch-LPIPS losses on the FFHQ-UV-Intrinsics dataset, the full pipeline achieves state-of-the-art performance for diffuse albedo estimation and demonstrates significantly improved fidelity for the complete rendering stack compared to prior methods. The resulting passes enable high-quality relighting and material editing of real faces.

Related papers

PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors [13.290464696196366]
We propose PhaSR (Physically Aligned Shadow Removal), addressing this through dual-level prior alignment.<n>Experiments show competitive performance in shadow removal with lower complexity and generalization to ambient lighting.
arXiv Detail & Related papers (2026-01-24T14:15:41Z)
UnReflectAnything: RGB-Only Highlight Removal by Rendering Synthetic Specular Supervision [51.72020507506023]
We present UnReflectAnything, an RGB-only framework that removes highlights from a single image.<n>It predicts a highlight map together with a reflection-free diffuse reconstruction.<n>It generalizes across natural and surgical domains where non-Lambertian surfaces and non-uniform lighting create severe highlights.
arXiv Detail & Related papers (2025-12-10T12:22:37Z)
Does FLUX Already Know How to Perform Physically Plausible Image Composition? [26.848563827256914]
SHINE is a training-free framework for Seamless, High-fidelity Insertion with Neutralized Errors.<n>We introduce ComplexCompo, featuring diverse resolutions and challenging conditions such as low lighting, strong illumination, intricate shadows, and reflective surfaces.
arXiv Detail & Related papers (2025-09-25T15:01:49Z)
Neural Spline Fields for Burst Image Fusion and Layer Separation [40.9442467471977]
We propose a versatile intermediate representation: a two-layer alpha-composited image plus flow model constructed with neural spline fields. Our method is able to jointly fuse a burst image capture into one high-resolution reconstruction and decompose it into transmission and obstruction layers. We find that, with no post-processing steps or learned priors, our generalizable model is able to outperform existing dedicated single-image and multi-view obstruction removal approaches.
arXiv Detail & Related papers (2023-12-21T18:54:19Z)
Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model [59.08821399652483]
Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. We propose to leverage DM within a compact latent space to generate concise guidance priors and introduce a novel solution called Reti-Diff for the IDIR task. Reti-Diff comprises two key components: the Retinex-based latent DM (RLDM) and the Retinex-guided transformer (RG
arXiv Detail & Related papers (2023-11-20T09:55:06Z)
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z)
Spatiotemporally Consistent HDR Indoor Lighting Estimation [66.26786775252592]
We propose a physically-motivated deep learning framework to solve the indoor lighting estimation problem. Given a single LDR image with a depth map, our method predicts spatially consistent lighting at any given image position. Our framework achieves photorealistic lighting prediction with higher quality compared to state-of-the-art single-image or video-based methods.
arXiv Detail & Related papers (2023-05-07T20:36:29Z)
Progressively-connected Light Field Network for Efficient View Synthesis [69.29043048775802]
We present a Progressively-connected Light Field network (ProLiF) for the novel view synthesis of complex forward-facing scenes. ProLiF encodes a 4D light field, which allows rendering a large batch of rays in one training step for image- or patch-level losses.
arXiv Detail & Related papers (2022-07-10T13:47:20Z)
DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables. In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing. Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z)
Intrinsic Image Transfer for Illumination Manipulation [1.2387676601792899]
This paper presents a novel intrinsic image transfer (IIT) algorithm for illumination manipulation. It creates a local image translation between two illumination surfaces. We illustrate that all losses can be reduced without the necessity of taking an intrinsic image decomposition.
arXiv Detail & Related papers (2021-07-01T19:12:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.