UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
- URL: http://arxiv.org/abs/2511.01678v1
- Date: Mon, 03 Nov 2025 15:41:41 GMT
- Title: UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
- Authors: Ropeway Liu, Hangjie Yuan, Bo Dong, Jiazheng Xing, Jinwang Wang, Rui Zhao, Yan Xing, Weihua Chen, Fan Wang,
- Abstract summary: We present UniLumos, a unified relighting framework for both images and videos.<n>We explicitly align lighting effects with the scene structure, enhancing physical plausibility.<n>Experiments demonstrate that UniLumos achieves state-of-the-art relighting with significantly improved physical consistency.
- Score: 31.03901228901908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relighting is a crucial task with both practical demand and artistic value, and recent diffusion models have shown strong potential by enabling rich and controllable lighting effects. However, as they are typically optimized in semantic latent space, where proximity does not guarantee physical correctness in visual space, they often produce unrealistic results, such as overexposed highlights, misaligned shadows, and incorrect occlusions. We address this with UniLumos, a unified relighting framework for both images and videos that brings RGB-space geometry feedback into a flow matching backbone. By supervising the model with depth and normal maps extracted from its outputs, we explicitly align lighting effects with the scene structure, enhancing physical plausibility. Nevertheless, this feedback requires high-quality outputs for supervision in visual space, making standard multi-step denoising computationally expensive. To mitigate this, we employ path consistency learning, allowing supervision to remain effective even under few-step training regimes. To enable fine-grained relighting control and supervision, we design a structured six-dimensional annotation protocol capturing core illumination attributes. Building upon this, we propose LumosBench, a disentangled attribute-level benchmark that evaluates lighting controllability via large vision-language models, enabling automatic and interpretable assessment of relighting precision across individual dimensions. Extensive experiments demonstrate that UniLumos achieves state-of-the-art relighting quality with significantly improved physical consistency, while delivering a 20x speedup for both image and video relighting. Code is available at https://github.com/alibaba-damo-academy/Lumos-Custom.
Related papers
- Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse Views [82.15089065452081]
We present Relightable Holoported Characters (RHC), a person-specific method for free-view rendering and relighting of full-body and highly dynamic humans.<n>Our transformer-based RelightNet predicts relit appearance within a single network pass, avoiding costly OLAT-basis capture and generation.<n>Experiments demonstrate our method's superior visual fidelity and lighting reproduction compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-11-29T00:17:34Z) - Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting [12.481640901722786]
We introduce GS-Light, a pipeline for text-guided relighting of 3D scenes represented via Gaussian Splatting (3DGS)<n> GS-Light implements a training-free extension of a single-input diffusion model to handle multi-view inputs.<n>We evaluate GS-Light on both indoor and outdoor scenes, comparing it to state-of-the-art baselines.
arXiv Detail & Related papers (2025-11-17T18:37:41Z) - RelightMaster: Precise Video Relighting with Multi-plane Light Images [59.56389629981934]
RelightMaster is a novel framework for accurate and controllable video relighting.<n>It generates physically plausible lighting and shadows and preserves original scene content.
arXiv Detail & Related papers (2025-11-09T08:12:09Z) - TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer [47.22201704648345]
Illumination and texture editing are critical dimensions for world-to-world transfer.<n>Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models.<n>We propose TC-Light, a novel generative computation to overcome these problems.
arXiv Detail & Related papers (2025-06-23T17:59:58Z) - LumiSculpt: Enabling Consistent Portrait Lighting in Video Generation [87.95655555555264]
Lighting plays a pivotal role in ensuring the naturalness and aesthetic quality of video generation.<n>LumiSculpt enables precise and consistent lighting control in T2V generation models.<n>LumiHuman is a new dataset for portrait lighting of images and videos.
arXiv Detail & Related papers (2024-10-30T12:44:08Z) - Zero-Reference Low-Light Enhancement via Physical Quadruple Priors [58.77377454210244]
We propose a new zero-reference low-light enhancement framework trainable solely with normal light images.
This framework is able to restore our illumination-invariant prior back to images, automatically achieving low-light enhancement.
arXiv Detail & Related papers (2024-03-19T17:36:28Z) - Relightable Neural Actor with Intrinsic Decomposition and Pose Control [80.06094206522668]
We propose Relightable Neural Actor, a new video-based method for learning a pose-driven neural human model that can be relighted.
For training, our method solely requires a multi-view recording of the human under a known, but static lighting condition.
To evaluate our approach in real-world scenarios, we collect a new dataset with four identities recorded under different light conditions, indoors and outdoors.
arXiv Detail & Related papers (2023-12-18T14:30:13Z) - Personalized Video Relighting With an At-Home Light Stage [0.0]
We develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos in real-time.
We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition.
arXiv Detail & Related papers (2023-11-15T10:33:20Z) - RelightableHands: Efficient Neural Relighting of Articulated Hand Models [46.60594572471557]
We present the first neural relighting approach for rendering high-fidelity personalized hands that can be animated in real-time under novel illumination.
Our approach adopts a teacher-student framework, where the teacher learns appearance under a single point light from images captured in a light-stage.
Using images rendered by the teacher model as training data, an efficient student model directly predicts appearance under natural illuminations in real-time.
arXiv Detail & Related papers (2023-02-09T18:59:48Z) - Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised
Adaptation [36.050270650417325]
We propose a learnable illumination enhancement model for high-level vision.
Inspired by real camera response functions, we assume that the illumination enhancement function should be a concave curve.
Our model architecture and training designs mutually benefit each other, forming a powerful unsupervised normal-to-low light adaptation framework.
arXiv Detail & Related papers (2022-10-07T19:32:55Z) - Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.
Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage.
We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z) - Relighting Images in the Wild with a Self-Supervised Siamese
Auto-Encoder [62.580345486483886]
We propose a self-supervised method for image relighting of single view images in the wild.
The method is based on an auto-encoder which deconstructs an image into two separate encodings.
We train our model on large-scale datasets such as Youtube 8M and CelebA.
arXiv Detail & Related papers (2020-12-11T16:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.