IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
- URL: http://arxiv.org/abs/2506.03150v1
- Date: Tue, 03 Jun 2025 17:59:52 GMT
- Title: IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
- Authors: Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Ronald Clark, Ming-Hsuan Yang,
- Abstract summary: IllumiCraft is an end-to-end diffusion framework accepting three complementary inputs.<n>It generates temporally coherent videos aligned with user-defined prompts.
- Score: 79.1960960864242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across frames. To address this limitation, we propose IllumiCraft, an end-to-end diffusion framework accepting three complementary inputs: (1) high-dynamic-range (HDR) video maps for detailed lighting control; (2) synthetically relit frames with randomized illumination changes (optionally paired with a static background reference image) to provide appearance cues; and (3) 3D point tracks that capture precise 3D geometry information. By integrating the lighting, appearance, and geometry cues within a unified diffusion architecture, IllumiCraft generates temporally coherent videos aligned with user-defined prompts. It supports background-conditioned and text-conditioned video relighting and provides better fidelity than existing controllable video generation methods. Project Page: https://yuanze-lin.me/IllumiCraft_page
Related papers
- Light-A-Video: Training-free Video Relighting via Progressive Light Fusion [52.420894727186216]
Light-A-Video is a training-free approach to achieve temporally smooth video relighting.<n>Adapted from image relighting models, Light-A-Video introduces two key techniques to enhance lighting consistency.
arXiv Detail & Related papers (2025-02-12T17:24:19Z) - VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation [62.64811405314847]
VidCRAFT3 is a novel framework for precise image-to-video generation.<n>It enables control over camera motion, object motion, and lighting direction simultaneously.<n>It produces high-quality video content, outperforming state-of-the-art methods in control granularity and visual coherence.
arXiv Detail & Related papers (2025-02-11T13:11:59Z) - Real-time 3D-aware Portrait Video Relighting [89.41078798641732]
We present the first real-time 3D-aware method for relighting in-the-wild videos of talking faces based on Neural Radiance Fields (NeRF)
We infer an albedo tri-plane, as well as a shading tri-plane based on a desired lighting condition for each video frame with fast dual-encoders.
Our method runs at 32.98 fps on consumer-level hardware and achieves state-of-the-art results in terms of reconstruction quality, lighting error, lighting instability, temporal consistency and inference speed.
arXiv Detail & Related papers (2024-10-24T01:34:11Z) - SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing [58.22339174221563]
We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing.
SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent.
Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures.
arXiv Detail & Related papers (2024-06-25T09:17:35Z) - MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z) - Relightable 3D Head Portraits from a Smartphone Video [15.639140551193073]
We present a system for creating a relightable 3D portrait of a human head.
Our neural pipeline operates on a sequence of frames captured by a smartphone camera with the flash blinking.
A deep rendering network is trained to regress dense albedo, normals, and environmental lighting maps for arbitrary new viewpoints.
arXiv Detail & Related papers (2020-12-17T22:49:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.