Fugu-MT 論文翻訳(概要): LuxDiT: Lighting Estimation with Video Diffusion Transformer

論文の概要: LuxDiT: Lighting Estimation with Video Diffusion Transformer

arxiv url: http://arxiv.org/abs/2509.03680v1
Date: Wed, 03 Sep 2025 19:59:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-05 20:21:09.959787
Title: LuxDiT: Lighting Estimation with Video Diffusion Transformer
Title（参考訳）: LuxDiT:ビデオ拡散変換器による照明推定
Authors: Ruofan Liang, Kai He, Zan Gojcic, Igor Gilitschenski, Sanja Fidler, Nandita Vijaykumar, Zian Wang,
Abstract要約: 単一の画像やビデオからシーンライティングを推定することは、コンピュータビジョンとグラフィックスにおいて長年の課題である。本稿では,映像拡散変換器を微調整し,視覚入力を前提としたHDR環境マップを生成するLuxDiTを提案する。
参考スコア（独自算出の注目度）: 66.60450792095901
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics. Learning-based approaches are constrained by the scarcity of ground-truth HDR environment maps, which are expensive to capture and limited in diversity. While recent generative models offer strong priors for image synthesis, lighting estimation remains difficult due to its reliance on indirect visual cues, the need to infer global (non-local) context, and the recovery of high-dynamic-range outputs. We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input. Trained on a large synthetic dataset with diverse lighting conditions, our model learns to infer illumination from indirect visual cues and generalizes effectively to real-world scenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas. Our method produces accurate lighting predictions with realistic angular high-frequency details, outperforming existing state-of-the-art techniques in both quantitative and qualitative evaluations.
Abstract（参考訳）: 単一の画像やビデオからシーンライティングを推定することは、コンピュータビジョンとグラフィックスにおいて長年の課題である。学習に基づくアプローチは、多様性を捉えるのに高価で制限された、地道なHDR環境マップの不足によって制約される。最近の生成モデルは画像合成に強い先行性を提供するが、間接的な視覚的手がかりへの依存、グローバルな(非局所的な)文脈の推測の必要性、高ダイナミックレンジ出力の回復などにより、照明推定は依然として困難である。本稿では,映像拡散変換器を微調整し,視覚入力を前提としたHDR環境マップを生成するLuxDiTを提案する。多様な照明条件を持つ大規模な合成データセットに基づいて,間接的な視覚的手がかりから照明を推定し,現実のシーンに効果的に一般化する。入力と予測環境マップのセマンティックアライメントを改善するために,HDRパノラマの収集したデータセットを用いた低ランク適応微調整戦略を導入する。提案手法は, 実測および定性的評価において, 既存の最先端技術よりも高い精度で, 現実的な角状高周波の精度で正確な照明予測を行う。

論文の概要: LuxDiT: Lighting Estimation with Video Diffusion Transformer

関連論文リスト