LuxDiT: Lighting Estimation with Video Diffusion Transformer
- URL: http://arxiv.org/abs/2509.03680v1
- Date: Wed, 03 Sep 2025 19:59:20 GMT
- Title: LuxDiT: Lighting Estimation with Video Diffusion Transformer
- Authors: Ruofan Liang, Kai He, Zan Gojcic, Igor Gilitschenski, Sanja Fidler, Nandita Vijaykumar, Zian Wang,
- Abstract summary: Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics.<n>We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input.
- Score: 66.60450792095901
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics. Learning-based approaches are constrained by the scarcity of ground-truth HDR environment maps, which are expensive to capture and limited in diversity. While recent generative models offer strong priors for image synthesis, lighting estimation remains difficult due to its reliance on indirect visual cues, the need to infer global (non-local) context, and the recovery of high-dynamic-range outputs. We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input. Trained on a large synthetic dataset with diverse lighting conditions, our model learns to infer illumination from indirect visual cues and generalizes effectively to real-world scenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas. Our method produces accurate lighting predictions with realistic angular high-frequency details, outperforming existing state-of-the-art techniques in both quantitative and qualitative evaluations.
Related papers
- BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions.
We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels.
Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z) - BVI-Lowlight: Fully Registered Benchmark Dataset for Low-Light Video Enhancement [44.1973928137492]
This paper introduces a novel low-light video dataset, consisting of 40 scenes in various motion scenarios under two low-lighting conditions.
We provide fully registered ground truth data captured in normal light using a programmable motorized dolly.
We refine them via image-based post-processing to ensure the pixel-wise alignment of frames in different light levels.
arXiv Detail & Related papers (2024-02-03T00:40:22Z) - Low-Light Image Enhancement with Illumination-Aware Gamma Correction and
Complete Image Modelling Network [69.96295927854042]
Low-light environments usually lead to less informative large-scale dark areas.
We propose to integrate the effectiveness of gamma correction with the strong modelling capacities of deep networks.
Because exponential operation introduces high computational complexity, we propose to use Taylor Series to approximate gamma correction.
arXiv Detail & Related papers (2023-08-16T08:46:51Z) - Spatiotemporally Consistent HDR Indoor Lighting Estimation [66.26786775252592]
We propose a physically-motivated deep learning framework to solve the indoor lighting estimation problem.
Given a single LDR image with a depth map, our method predicts spatially consistent lighting at any given image position.
Our framework achieves photorealistic lighting prediction with higher quality compared to state-of-the-art single-image or video-based methods.
arXiv Detail & Related papers (2023-05-07T20:36:29Z) - TensoIR: Tensorial Inverse Rendering [51.57268311847087]
TensoIR is a novel inverse rendering approach based on tensor factorization and neural fields.
TensoRF is a state-of-the-art approach for radiance field modeling.
arXiv Detail & Related papers (2023-04-24T21:39:13Z) - Neural Radiance Transfer Fields for Relightable Novel-view Synthesis
with Global Illumination [63.992213016011235]
We propose a method for scene relighting under novel views by learning a neural precomputed radiance transfer function.
Our method can be solely supervised on a set of real images of the scene under a single unknown lighting condition.
Results show that the recovered disentanglement of scene parameters improves significantly over the current state of the art.
arXiv Detail & Related papers (2022-07-27T16:07:48Z) - Deep Lighting Environment Map Estimation from Spherical Panoramas [0.0]
We present a data-driven model that estimates an HDR lighting environment map from a single LDR monocular spherical panorama.
We exploit the availability of surface geometry to employ image-based relighting as a data generator and supervision mechanism.
arXiv Detail & Related papers (2020-05-16T14:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.