Fugu-MT 論文翻訳(概要): FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

論文の概要: FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

arxiv url: http://arxiv.org/abs/2606.24876v1
Date: Tue, 23 Jun 2026 17:53:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:49.136179
Title: FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation
Title（参考訳）: FLAT: 幾何学的精度の高いシーン生成のためのフィードフォワード潜時三角法
Authors: Orest Kupyn, Goutam Bhat, Philipp Henzler, Fabian Manhardt, Christian Rupprecht, Federico Tombari,
Abstract要約: ビデオ拡散潜水器から直接三角形スプラットをデコードするFLATを導入する。標準的なベンチマークでは、FLATは最先端のフィードフォワードベースラインに比べて、はるかに優れた幾何学的精度を実現している。軽量なテスト時間改善ステップにより,予測された三角形のスープが不透明でゲームエンジン対応の表現に変換されることを示す。
参考スコア（独自算出の注目度）: 72.38727895659405
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward latent scene decoders typically output volumetric 3D Gaussians that lack a well-defined surface, limiting their use in simulation or standard graphics pipelines. This motivates decoding surface-aligned primitives that are not only renderable but also closer to explicit geometric assets. We ask whether compressed video diffusion latents can be mapped directly to explicit surface primitives in a single pass. To this end, we introduce FLAT and, for the first time, show that triangle splats can be decoded directly from video diffusion latents. Compared with decoding 3D Gaussians, predicting flat primitives is notoriously more challenging due to high sensitivity to primitive orientations, oftentimes leading to poor gradient flow. FLAT solves with two key ingredients: a ray-centered rotation parameterization for triangle regression and a novel product window function that improves gradient flow during differentiable triangle rendering. On standard benchmarks, FLAT achieves significantly better geometric accuracy while maintaining competitive visual quality compared to state-of-the-art feedforward baselines. We further show that a lightweight test-time refinement step converts the predicted triangle soup into a fully opaque, game-engine-ready representation that supports real-time rendering. By evaluating 3DGS, 2DGS, and triangle splatting variants under an identical training setup, we provide the first systematic analysis of representation tradeoffs in feedforward scene generation. The project page is available at https://flat-splat.github.io
Abstract（参考訳）: 単一の画像から探索可能な3Dシーンを生成するには、強力な生成先と、下流での使用に適した正確な幾何学的表現が必要である。現在のビデオ拡散モデルは、高品質な生成と暗黙的に潜在空間の多視点幾何学構造を符号化する。しかし、既存のフィードフォワード遅延シーンデコーダは、よく定義された表面を持たないボリューム3Dガウスを出力し、シミュレーションや標準グラフィックスパイプラインでの使用を制限する。これは、レンダリング可能なだけでなく、明示的な幾何学的資産に近い表面整列プリミティブの復号化を動機付けている。圧縮されたビデオ拡散潜水剤を1回のパスで明示的な表面プリミティブに直接マッピングできるかどうかを問う。この目的のために,FLATを導入し,ビデオ拡散潜水器から直接三角形スプラットを復号化できることを初めて示す。 3Dガウスの復号と比較して、平坦なプリミティブの予測はプリミティブ指向に対する高い感度のため、しばしば勾配フローの低下につながるため、より難しいことが知られている。 FLATは、三角形回帰のための線中心回転パラメータ化と、微分可能な三角形描画時の勾配流を改善する新しい製品ウィンドウ関数の2つの重要な要素で解決する。標準的なベンチマークでは、FLATは最先端のフィードフォワードベースラインと比較して、競争力のある視覚的品質を維持しながら、かなり優れた幾何学的精度を実現している。さらに、軽量なテストタイム改善ステップにより、予測された三角形のスープが、リアルタイムレンダリングをサポートする完全に不透明でゲームエンジン対応の表現に変換されることを示す。 3DGS, 2DGS, 三角形スプラッティング変種を同一の訓練装置で評価することにより, フィードフォワードシーン生成における表現トレードオフの体系的解析を行った。プロジェクトのページはhttps://flat-splat.github.ioで公開されている。

論文の概要: FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

関連論文リスト