Fugu-MT 論文翻訳(概要): LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images

論文の概要: LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images

arxiv url: http://arxiv.org/abs/2605.23287v1
Date: Fri, 22 May 2026 06:59:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.230772
Title: LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images
Title（参考訳）: LangFlash: スパースアンポーズ画像からのフィードフォワード3D言語ガウススティング
Authors: Yilong Liu, Wanhua Li, Chen Zhu-Tian, Hanspeter Pfister,
Abstract要約: LangFlashは3D言語ガウススプティングのためのフィードフォワードフレームワークである。 LangFlashは、単一の前方パスでジオメトリとセマンティクスを直接予測する。本稿では,グローバルな意味辞書と局所的に変化する原単位の重みを結合したスパース意味符号化方式を提案する。
参考スコア（独自算出の注目度）: 30.52329450141629
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present LangFlash, a feed-forward framework for 3D Language Gaussian Splatting that reconstructs 3D scenes parameterized by Gaussian primitives enriched with language-aligned semantic features from sparse unposed multi-view images. Unlike optimization-based 3D methods, LangFlash directly predicts the geometry and semantics in a single forward pass, enabling low-latency 3D reconstruction and language-consistent scene understanding. To support large-scale training, we enriched the RealEstate10k dataset with coherent and dense semantic information for 3D semantic supervision. Furthermore, we propose a sparse semantic encoding scheme that combines a global semantic dictionary with locally varying per-primitive weights, preserving high-level linguistic information, while reducing representation complexity. Experimental results show that LangFlash achieves superior novel view synthesis and semantic consistency compared with previous methods. This study establishes a new paradigm for pose-free, language-grounded 3D scene reconstruction, advancing generalizable 3D vision and multimodal scene understanding. Demo is available at https://liylo.github.io/langflash.github.io/.
Abstract（参考訳）: 我々は,ガウス的プリミティブによってパラメータ化された3次元シーンを,スパースな多視点画像から言語対応のセマンティック特徴に富んだ3次元シーンを再構成する3次元言語ガウス的スプラッティングのためのフィードフォワードフレームワークであるLangFlashを提案する。最適化ベースの3D手法とは異なり、LangFlashは1つの前方パスにおける幾何学と意味を直接予測し、低レイテンシな3D再構成と言語一貫性のあるシーン理解を可能にする。大規模トレーニングを支援するため、我々はRealEstate10kデータセットに3Dセマンティックインスペクションのためのコヒーレントで高密度なセマンティック情報を加えました。さらに,グローバルな意味辞書と局所的に異なる主語単位の重み付けを組み合わせたスパース意味符号化方式を提案し,表現の複雑さを低減しつつ,高レベルな言語情報を保存する。実験結果から,LangFlashは従来の手法と比較して,より優れた新規なビュー合成とセマンティック一貫性を実現することが示された。本研究は、ポーズレスで言語を基盤とした3Dシーン再構築、一般化可能な3Dビジョンの進歩、マルチモーダルシーン理解のための新しいパラダイムを確立する。 Demoはhttps://liylo.github.io/langflash.github.io/で公開されている。

論文の概要: LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images

関連論文リスト