Multi-scale Attention-Guided Intrinsic Decomposition and Rendering Pass Prediction for Facial Images
- URL: http://arxiv.org/abs/2512.16511v1
- Date: Thu, 18 Dec 2025 13:23:49 GMT
- Title: Multi-scale Attention-Guided Intrinsic Decomposition and Rendering Pass Prediction for Facial Images
- Authors: Hossein Javidnia,
- Abstract summary: This paper introduces MAGINet, a Multi-scale Attention-Guided Intrinsics Network that predicts a light-normalized diffuse albedo map from a single RGB portrait.<n>The pipeline achieves state-of-the-art performance for diffuse albedo estimation and demonstrates significantly improved fidelity for the complete rendering stack compared to prior methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate intrinsic decomposition of face images under unconstrained lighting is a prerequisite for photorealistic relighting, high-fidelity digital doubles, and augmented-reality effects. This paper introduces MAGINet, a Multi-scale Attention-Guided Intrinsics Network that predicts a $512\times512$ light-normalized diffuse albedo map from a single RGB portrait. MAGINet employs hierarchical residual encoding, spatial-and-channel attention in a bottleneck, and adaptive multi-scale feature fusion in the decoder, yielding sharper albedo boundaries and stronger lighting invariance than prior U-Net variants. The initial albedo prediction is upsampled to $1024\times1024$ and refined by a lightweight three-layer CNN (RefinementNet). Conditioned on this refined albedo, a Pix2PixHD-based translator then predicts a comprehensive set of five additional physically based rendering passes: ambient occlusion, surface normal, specular reflectance, translucency, and raw diffuse colour (with residual lighting). Together with the refined albedo, these six passes form the complete intrinsic decomposition. Trained with a combination of masked-MSE, VGG, edge, and patch-LPIPS losses on the FFHQ-UV-Intrinsics dataset, the full pipeline achieves state-of-the-art performance for diffuse albedo estimation and demonstrates significantly improved fidelity for the complete rendering stack compared to prior methods. The resulting passes enable high-quality relighting and material editing of real faces.
Related papers
- PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors [13.290464696196366]
We propose PhaSR (Physically Aligned Shadow Removal), addressing this through dual-level prior alignment.<n>Experiments show competitive performance in shadow removal with lower complexity and generalization to ambient lighting.
arXiv Detail & Related papers (2026-01-24T14:15:41Z) - UnReflectAnything: RGB-Only Highlight Removal by Rendering Synthetic Specular Supervision [51.72020507506023]
We present UnReflectAnything, an RGB-only framework that removes highlights from a single image.<n>It predicts a highlight map together with a reflection-free diffuse reconstruction.<n>It generalizes across natural and surgical domains where non-Lambertian surfaces and non-uniform lighting create severe highlights.
arXiv Detail & Related papers (2025-12-10T12:22:37Z) - Does FLUX Already Know How to Perform Physically Plausible Image Composition? [26.848563827256914]
SHINE is a training-free framework for Seamless, High-fidelity Insertion with Neutralized Errors.<n>We introduce ComplexCompo, featuring diverse resolutions and challenging conditions such as low lighting, strong illumination, intricate shadows, and reflective surfaces.
arXiv Detail & Related papers (2025-09-25T15:01:49Z) - Neural Spline Fields for Burst Image Fusion and Layer Separation [40.9442467471977]
We propose a versatile intermediate representation: a two-layer alpha-composited image plus flow model constructed with neural spline fields.
Our method is able to jointly fuse a burst image capture into one high-resolution reconstruction and decompose it into transmission and obstruction layers.
We find that, with no post-processing steps or learned priors, our generalizable model is able to outperform existing dedicated single-image and multi-view obstruction removal approaches.
arXiv Detail & Related papers (2023-12-21T18:54:19Z) - Reti-Diff: Illumination Degradation Image Restoration with Retinex-based
Latent Diffusion Model [59.08821399652483]
Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination.
Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution.
We propose to leverage DM within a compact latent space to generate concise guidance priors and introduce a novel solution called Reti-Diff for the IDIR task.
Reti-Diff comprises two key components: the Retinex-based latent DM (RLDM) and the Retinex-guided transformer (RG
arXiv Detail & Related papers (2023-11-20T09:55:06Z) - Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions.
The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result.
To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z) - Spatiotemporally Consistent HDR Indoor Lighting Estimation [66.26786775252592]
We propose a physically-motivated deep learning framework to solve the indoor lighting estimation problem.
Given a single LDR image with a depth map, our method predicts spatially consistent lighting at any given image position.
Our framework achieves photorealistic lighting prediction with higher quality compared to state-of-the-art single-image or video-based methods.
arXiv Detail & Related papers (2023-05-07T20:36:29Z) - Progressively-connected Light Field Network for Efficient View Synthesis [69.29043048775802]
We present a Progressively-connected Light Field network (ProLiF) for the novel view synthesis of complex forward-facing scenes.
ProLiF encodes a 4D light field, which allows rendering a large batch of rays in one training step for image- or patch-level losses.
arXiv Detail & Related papers (2022-07-10T13:47:20Z) - DIB-R++: Learning to Predict Lighting and Material with a Hybrid
Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables.
In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing.
Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z) - Intrinsic Image Transfer for Illumination Manipulation [1.2387676601792899]
This paper presents a novel intrinsic image transfer (IIT) algorithm for illumination manipulation.
It creates a local image translation between two illumination surfaces.
We illustrate that all losses can be reduced without the necessity of taking an intrinsic image decomposition.
arXiv Detail & Related papers (2021-07-01T19:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.