RelightVid: Temporal-Consistent Diffusion Model for Video Relighting
- URL: http://arxiv.org/abs/2501.16330v1
- Date: Mon, 27 Jan 2025 18:59:57 GMT
- Title: RelightVid: Temporal-Consistent Diffusion Model for Video Relighting
- Authors: Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein, Dahua Lin,
- Abstract summary: RelightVid is a flexible framework for video relighting.
It can accept background video, text prompts, or environment maps as relighting conditions.
It achieves arbitrary video relighting with high temporal consistency without intrinsic decomposition.
- Score: 95.10341081549129
- License:
- Abstract: Diffusion models have demonstrated remarkable success in image generation and editing, with recent advancements enabling albedo-preserving image relighting. However, applying these models to video relighting remains challenging due to the lack of paired video relighting datasets and the high demands for output fidelity and temporal consistency, further complicated by the inherent randomness of diffusion models. To address these challenges, we introduce RelightVid, a flexible framework for video relighting that can accept background video, text prompts, or environment maps as relighting conditions. Trained on in-the-wild videos with carefully designed illumination augmentations and rendered videos under extreme dynamic lighting, RelightVid achieves arbitrary video relighting with high temporal consistency without intrinsic decomposition while preserving the illumination priors of its image backbone.
Related papers
- Light-A-Video: Training-free Video Relighting via Progressive Light Fusion [52.420894727186216]
Light-A-Video is a training-free approach to achieve temporally smooth video relighting.
Adapted from image relighting models, Light-A-Video introduces two key techniques to enhance lighting consistency.
arXiv Detail & Related papers (2025-02-12T17:24:19Z) - Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT [98.56372305225271]
Lumina-Next achieves exceptional performance in the generation of images with Next-DiT.
Lumina-Video incorporates a Multi-scale Next-DiT architecture, which jointly learns multiple patchifications.
We propose Lumina-V2A, a video-to-audio model based on Next-DiT, to create synchronized sounds for generated videos.
arXiv Detail & Related papers (2025-02-10T18:58:11Z) - LumiSculpt: A Consistency Lighting Control Network for Video Generation [67.48791242688493]
Lighting plays a pivotal role in ensuring the naturalness of video generation.
It remains challenging to disentangle and model independent and coherent lighting attributes.
LumiSculpt enables precise and consistent lighting control in T2V generation models.
arXiv Detail & Related papers (2024-10-30T12:44:08Z) - BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video
Deflickering [13.476629715971221]
We introduce the histogram-assisted solution, BlazeBVD, for high-fidelity and rapid blind video deflickering.
BlazeBVD uses smoothed illumination histograms within STE filtering to ease the challenge of learning temporal data.
It achieves inference speeds up to 10x faster than state-of-the-arts.
arXiv Detail & Related papers (2024-03-10T15:56:55Z) - Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World
Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling.
It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences.
It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z) - Personalized Video Relighting With an At-Home Light Stage [0.0]
We develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos in real-time.
We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition.
arXiv Detail & Related papers (2023-11-15T10:33:20Z) - Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition [78.50328335703914]
Diffusion in the Dark (DiD) is a diffusion model for low-light image reconstruction for text recognition.
We demonstrate that DiD, without any task-specific optimization, can outperform SOTA low-light methods in low-light text recognition on real images.
arXiv Detail & Related papers (2023-03-07T23:52:51Z) - Neural Video Portrait Relighting in Real-time via Consistency Modeling [41.04622998356025]
We propose a neural approach for real-time, high-quality and coherent video portrait relighting.
We propose a hybrid structure and lighting disentanglement in an encoder-decoder architecture.
We also propose a lighting sampling strategy to model the illumination consistency and mutation for natural portrait light manipulation in real-world.
arXiv Detail & Related papers (2021-04-01T14:13:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.