Related papers: LumiSculpt: A Consistency Lighting Control Network for Video Generation

LumiSculpt: A Consistency Lighting Control Network for Video Generation

URL: http://arxiv.org/abs/2410.22979v1
Date: Wed, 30 Oct 2024 12:44:08 GMT
Title: LumiSculpt: A Consistency Lighting Control Network for Video Generation
Authors: Yuxin Zhang, Dandan Zheng, Biao Gong, Jingdong Chen, Ming Yang, Weiming Dong, Changsheng Xu,
Abstract summary: Lighting plays a pivotal role in ensuring the naturalness of video generation. It remains challenging to disentangle and model independent and coherent lighting attributes. LumiSculpt enables precise and consistent lighting control in T2V generation models.
Score: 67.48791242688493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lighting plays a pivotal role in ensuring the naturalness of video generation, significantly influencing the aesthetic quality of the generated content. However, due to the deep coupling between lighting and the temporal features of videos, it remains challenging to disentangle and model independent and coherent lighting attributes, limiting the ability to control lighting in video generation. In this paper, inspired by the established controllable T2I models, we propose LumiSculpt, which, for the first time, enables precise and consistent lighting control in T2V generation models.LumiSculpt equips the video generation with strong interactive capabilities, allowing the input of custom lighting reference image sequences. Furthermore, the core learnable plug-and-play module of LumiSculpt facilitates remarkable control over lighting intensity, position, and trajectory in latent video diffusion models based on the advanced DiT backbone.Additionally, to effectively train LumiSculpt and address the issue of insufficient lighting data, we construct LumiHuman, a new lightweight and flexible dataset for portrait lighting of images and videos. Experimental results demonstrate that LumiSculpt achieves precise and high-quality lighting control in video generation.

Related papers

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation [79.1960960864242]
IllumiCraft is an end-to-end diffusion framework accepting three complementary inputs.<n>It generates temporally coherent videos aligned with user-defined prompts.
arXiv Detail & Related papers (2025-06-03T17:59:52Z)
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset [33.388553876200795]
We introduce Lux Post Facto, a novel portrait video relighting method that produces both photorealistic and temporally consistent lighting effects. Our technique uses a hybrid dataset consisting of static expression OLAT data and in-the-wild portrait performance videos to jointly learn relighting and temporal modeling.
arXiv Detail & Related papers (2025-03-18T17:55:22Z)
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion [52.420894727186216]
Light-A-Video is a training-free approach to achieve temporally smooth video relighting. Adapted from image relighting models, Light-A-Video introduces two key techniques to enhance lighting consistency.
arXiv Detail & Related papers (2025-02-12T17:24:19Z)
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation [62.64811405314847]
VidCRAFT3 is a novel framework for precise image-to-video generation. It enables control over camera motion, object motion, and lighting direction simultaneously. It produces high-quality video content, outperforming state-of-the-art methods in control granularity and visual coherence.
arXiv Detail & Related papers (2025-02-11T13:11:59Z)
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT [98.56372305225271]
Lumina-Next achieves exceptional performance in the generation of images with Next-DiT. Lumina-Video incorporates a Multi-scale Next-DiT architecture, which jointly learns multiple patchifications. We propose Lumina-V2A, a video-to-audio model based on Next-DiT, to create synchronized sounds for generated videos.
arXiv Detail & Related papers (2025-02-10T18:58:11Z)
RelightVid: Temporal-Consistent Diffusion Model for Video Relighting [95.10341081549129]
RelightVid is a flexible framework for video relighting. It can accept background video, text prompts, or environment maps as relighting conditions. It achieves arbitrary video relighting with high temporal consistency without intrinsic decomposition.
arXiv Detail & Related papers (2025-01-27T18:59:57Z)
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting [13.433775723052753]
Given a source image and a target lighting image, LumiNet synthesizes a relit version of the source scene that captures the target's lighting. LumiNet processes latent representations from two different images - preserving geometry and albedo from the source while transferring lighting characteristics from the target.
arXiv Detail & Related papers (2024-11-29T18:59:11Z)
DifFRelight: Diffusion-Based Facial Performance Relighting [12.909429637057343]
We present a novel framework for free-viewpoint facial performance relighting using diffusion-based image-to-image translation. We train a diffusion model for precise lighting control, enabling high-fidelity relit facial images from flat-lit inputs. The model accurately reproduces complex lighting effects like eye reflections, subsurface scattering, self-shadowing, and translucency.
arXiv Detail & Related papers (2024-10-10T17:56:44Z)
BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z)
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers [69.96398489841116]
We introduce the Lumina-T2X family of Flow-based Large Diffusion Transformers (Flag-DiT) Flag-DiT is a unified framework designed to transform noise into images, videos, multi-view 3D objects, and audio clips conditioned on text instructions. This is particularly beneficial for creating ultra-high-definition images with our Lumina-T2I model and long 720p videos with our Lumina-T2V model.
arXiv Detail & Related papers (2024-05-09T17:35:16Z)
LightIt: Illumination Modeling and Control for Diffusion Models [61.80461416451116]
We introduce LightIt, a method for explicit illumination control for image generation. Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation. Our method is the first that enables the generation of images with controllable, consistent lighting.
arXiv Detail & Related papers (2024-03-15T18:26:33Z)
Relightable Neural Actor with Intrinsic Decomposition and Pose Control [80.06094206522668]
We propose Relightable Neural Actor, a new video-based method for learning a pose-driven neural human model that can be relighted. For training, our method solely requires a multi-view recording of the human under a known, but static lighting condition. To evaluate our approach in real-world scenarios, we collect a new dataset with four identities recorded under different light conditions, indoors and outdoors.
arXiv Detail & Related papers (2023-12-18T14:30:13Z)
Personalized Video Relighting With an At-Home Light Stage [0.0]
We develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos in real-time. We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition.
arXiv Detail & Related papers (2023-11-15T10:33:20Z)
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation [44.220329202024494]
We present a few-shot-based tuning framework, LAMP, which enables text-to-image diffusion model Learn A specific Motion Pattern with 816 videos on a single GPU. Specifically, we design a first-frame-conditioned pipeline that uses an off-the-shelf text-to-image model for content generation. To capture the features of temporal dimension, we expand the pretrained 2D convolution layers of the T2I model to our novel temporal-spatial motion learning layers.
arXiv Detail & Related papers (2023-10-16T19:03:19Z)
Controllable Data Augmentation Through Deep Relighting [75.96144853354362]
We explore how to augment a varied set of image datasets through relighting so as to improve the ability of existing models to be invariant to illumination changes. We develop a tool, based on an encoder-decoder network, that is able to quickly generate multiple variations of the illumination of various input scenes. We demonstrate that by training models on datasets that have been augmented with our pipeline, it is possible to achieve higher performance on localization benchmarks.
arXiv Detail & Related papers (2021-10-26T20:02:51Z)
Neural Video Portrait Relighting in Real-time via Consistency Modeling [41.04622998356025]
We propose a neural approach for real-time, high-quality and coherent video portrait relighting. We propose a hybrid structure and lighting disentanglement in an encoder-decoder architecture. We also propose a lighting sampling strategy to model the illumination consistency and mutation for natural portrait light manipulation in real-world.
arXiv Detail & Related papers (2021-04-01T14:13:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.