Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models
- URL: http://arxiv.org/abs/2508.12945v1
- Date: Mon, 18 Aug 2025 14:21:22 GMT
- Title: Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models
- Authors: Jianshu Zeng, Yuxuan Liu, Yutong Feng, Chenxuan Miao, Zixiang Gao, Jiwang Qu, Jianzhang Zhang, Bin Wang, Kun Yuan,
- Abstract summary: We propose Lumen, an end-to-end video relighting framework developed on large-scale video generative models.<n>For the synthetic domain, we leverage advanced 3D rendering engine to curate video pairs in diverse environments.<n>For the realistic domain, we adapt a HDR-based lighting simulation to complement the lack of paired in-the-wild videos.
- Score: 18.008901495139717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video relighting is a challenging yet valuable task, aiming to replace the background in videos while correspondingly adjusting the lighting in the foreground with harmonious blending. During translation, it is essential to preserve the original properties of the foreground, e.g., albedo, and propagate consistent relighting among temporal frames. In this paper, we propose Lumen, an end-to-end video relighting framework developed on large-scale video generative models, receiving flexible textual description for instructing the control of lighting and background. Considering the scarcity of high-qualified paired videos with the same foreground in various lighting conditions, we construct a large-scale dataset with a mixture of realistic and synthetic videos. For the synthetic domain, benefiting from the abundant 3D assets in the community, we leverage advanced 3D rendering engine to curate video pairs in diverse environments. For the realistic domain, we adapt a HDR-based lighting simulation to complement the lack of paired in-the-wild videos. Powered by the aforementioned dataset, we design a joint training curriculum to effectively unleash the strengths of each domain, i.e., the physical consistency in synthetic videos, and the generalized domain distribution in realistic videos. To implement this, we inject a domain-aware adapter into the model to decouple the learning of relighting and domain appearance distribution. We construct a comprehensive benchmark to evaluate Lumen together with existing methods, from the perspectives of foreground preservation and video consistency assessment. Experimental results demonstrate that Lumen effectively edit the input into cinematic relighted videos with consistent lighting and strict foreground preservation. Our project page: https://lumen-relight.github.io/
Related papers
- Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising [83.09163795450407]
We propose an approach to enhancing synthetic video realism, which can re-render synthetic videos from a simulator in photorealistic fashion.<n>Our framework focuses on preserving the multi-level structures from synthetic videos into the enhanced one in both spatial and temporal domains.
arXiv Detail & Related papers (2025-11-18T18:06:29Z) - RelightMaster: Precise Video Relighting with Multi-plane Light Images [59.56389629981934]
RelightMaster is a novel framework for accurate and controllable video relighting.<n>It generates physically plausible lighting and shadows and preserves original scene content.
arXiv Detail & Related papers (2025-11-09T08:12:09Z) - UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback [31.03901228901908]
We present UniLumos, a unified relighting framework for both images and videos.<n>We explicitly align lighting effects with the scene structure, enhancing physical plausibility.<n>Experiments demonstrate that UniLumos achieves state-of-the-art relighting with significantly improved physical consistency.
arXiv Detail & Related papers (2025-11-03T15:41:41Z) - ReLumix: Extending Image Relighting to Video via Video Diffusion Models [5.890782804843724]
Controlling illumination during video post-production is a crucial yet elusive goal in computational photography.<n>This paper introduces ReLumix, a novel framework that decouples the relighting from temporal synthesis.<n>Although trained on synthetic data, ReLumix shows competitive generalization to real-world videos.
arXiv Detail & Related papers (2025-09-28T09:35:33Z) - TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer [47.22201704648345]
Illumination and texture editing are critical dimensions for world-to-world transfer.<n>Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models.<n>We propose TC-Light, a novel generative computation to overcome these problems.
arXiv Detail & Related papers (2025-06-23T17:59:58Z) - UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting [85.27994475113056]
We introduce a general-purpose approach that jointly estimates albedo and synthesizes relit outputs in a single pass.<n>Our model demonstrates strong generalization across diverse domains and surpasses previous methods in both visual fidelity and temporal consistency.
arXiv Detail & Related papers (2025-06-18T17:56:45Z) - IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation [79.1960960864242]
IllumiCraft is an end-to-end diffusion framework accepting three complementary inputs.<n>It generates temporally coherent videos aligned with user-defined prompts.
arXiv Detail & Related papers (2025-06-03T17:59:52Z) - Light-A-Video: Training-free Video Relighting via Progressive Light Fusion [52.420894727186216]
Light-A-Video is a training-free approach to achieve temporally smooth video relighting.<n>Adapted from image relighting models, Light-A-Video introduces two key techniques to enhance lighting consistency.
arXiv Detail & Related papers (2025-02-12T17:24:19Z) - Real-time 3D-aware Portrait Video Relighting [89.41078798641732]
We present the first real-time 3D-aware method for relighting in-the-wild videos of talking faces based on Neural Radiance Fields (NeRF)
We infer an albedo tri-plane, as well as a shading tri-plane based on a desired lighting condition for each video frame with fast dual-encoders.
Our method runs at 32.98 fps on consumer-level hardware and achieves state-of-the-art results in terms of reconstruction quality, lighting error, lighting instability, temporal consistency and inference speed.
arXiv Detail & Related papers (2024-10-24T01:34:11Z) - Spatiotemporally Consistent HDR Indoor Lighting Estimation [66.26786775252592]
We propose a physically-motivated deep learning framework to solve the indoor lighting estimation problem.
Given a single LDR image with a depth map, our method predicts spatially consistent lighting at any given image position.
Our framework achieves photorealistic lighting prediction with higher quality compared to state-of-the-art single-image or video-based methods.
arXiv Detail & Related papers (2023-05-07T20:36:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.