Related papers: RelightVid: Temporal-Consistent Diffusion Model for Video Relighting

RelightVid: Temporal-Consistent Diffusion Model for Video Relighting

URL: http://arxiv.org/abs/2501.16330v1
Date: Mon, 27 Jan 2025 18:59:57 GMT
Title: RelightVid: Temporal-Consistent Diffusion Model for Video Relighting
Authors: Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein, Dahua Lin,
Abstract summary: RelightVid is a flexible framework for video relighting.<n>It can accept background video, text prompts, or environment maps as relighting conditions.<n>It achieves arbitrary video relighting with high temporal consistency without intrinsic decomposition.
Score: 95.10341081549129
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have demonstrated remarkable success in image generation and editing, with recent advancements enabling albedo-preserving image relighting. However, applying these models to video relighting remains challenging due to the lack of paired video relighting datasets and the high demands for output fidelity and temporal consistency, further complicated by the inherent randomness of diffusion models. To address these challenges, we introduce RelightVid, a flexible framework for video relighting that can accept background video, text prompts, or environment maps as relighting conditions. Trained on in-the-wild videos with carefully designed illumination augmentations and rendered videos under extreme dynamic lighting, RelightVid achieves arbitrary video relighting with high temporal consistency without intrinsic decomposition while preserving the illumination priors of its image backbone.

Related papers

UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting [85.27994475113056]
We introduce a general-purpose approach that jointly estimates albedo and synthesizes relit outputs in a single pass.<n>Our model demonstrates strong generalization across diverse domains and surpasses previous methods in both visual fidelity and temporal consistency.
arXiv Detail & Related papers (2025-06-18T17:56:45Z)
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset [33.388553876200795]
We introduce Lux Post Facto, a novel portrait video relighting method that produces both photorealistic and temporally consistent lighting effects. Our technique uses a hybrid dataset consisting of static expression OLAT data and in-the-wild portrait performance videos to jointly learn relighting and temporal modeling.
arXiv Detail & Related papers (2025-03-18T17:55:22Z)
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion [52.420894727186216]
Light-A-Video is a training-free approach to achieve temporally smooth video relighting. Adapted from image relighting models, Light-A-Video introduces two key techniques to enhance lighting consistency.
arXiv Detail & Related papers (2025-02-12T17:24:19Z)
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT [98.56372305225271]
Lumina-Next achieves exceptional performance in the generation of images with Next-DiT. Lumina-Video incorporates a Multi-scale Next-DiT architecture, which jointly learns multiple patchifications. We propose Lumina-V2A, a video-to-audio model based on Next-DiT, to create synchronized sounds for generated videos.
arXiv Detail & Related papers (2025-02-10T18:58:11Z)
LumiSculpt: A Consistency Lighting Control Network for Video Generation [67.48791242688493]
Lighting plays a pivotal role in ensuring the naturalness of video generation. It remains challenging to disentangle and model independent and coherent lighting attributes. LumiSculpt enables precise and consistent lighting control in T2V generation models.
arXiv Detail & Related papers (2024-10-30T12:44:08Z)
Real-time 3D-aware Portrait Video Relighting [89.41078798641732]
We present the first real-time 3D-aware method for relighting in-the-wild videos of talking faces based on Neural Radiance Fields (NeRF) We infer an albedo tri-plane, as well as a shading tri-plane based on a desired lighting condition for each video frame with fast dual-encoders. Our method runs at 32.98 fps on consumer-level hardware and achieves state-of-the-art results in terms of reconstruction quality, lighting error, lighting instability, temporal consistency and inference speed.
arXiv Detail & Related papers (2024-10-24T01:34:11Z)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling. It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences. It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z)
Personalized Video Relighting With an At-Home Light Stage [0.0]
We develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos in real-time. We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition.
arXiv Detail & Related papers (2023-11-15T10:33:20Z)
Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition [78.50328335703914]
Diffusion in the Dark (DiD) is a diffusion model for low-light image reconstruction for text recognition. We demonstrate that DiD, without any task-specific optimization, can outperform SOTA low-light methods in low-light text recognition on real images.
arXiv Detail & Related papers (2023-03-07T23:52:51Z)
Neural Video Portrait Relighting in Real-time via Consistency Modeling [41.04622998356025]
We propose a neural approach for real-time, high-quality and coherent video portrait relighting. We propose a hybrid structure and lighting disentanglement in an encoder-decoder architecture. We also propose a lighting sampling strategy to model the illumination consistency and mutation for natural portrait light manipulation in real-world.
arXiv Detail & Related papers (2021-04-01T14:13:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.