Fugu-MT 論文翻訳(概要): Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

論文の概要: Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

arxiv url: http://arxiv.org/abs/2604.14025v1
Date: Wed, 15 Apr 2026 16:07:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.625509
Title: Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
Title（参考訳）: フィードフォワード3次元シーンモデリング:問題駆動の視点から
Authors: Weijie Wang, Qihang Cao, Sensen Gao, Donny Y. Chen, Haofei Xu, Wenjing Bian, Songyou Peng, Tat-Jen Cham, Chuanxia Zheng, Andreas Geiger, Jianfei Cai, Jia-Wang Bian, Bohan Zhuang,
Abstract要約: 汎用的なフィードフォワード3D再構築は近年急速に進展している。既存のフィードフォワードアプローチも同様に高いレベルのアーキテクチャパターンを共有している。本稿では,出力形式に依存しないモデル設計戦略を中心とした新しい分類法を提案する。
参考スコア（独自算出の注目度）: 91.23306722968509
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.
Abstract（参考訳）: 2Dインプットから3D表現を再構築することは、コンピュータビジョンとグラフィックスの基本的な課題であり、物理的な世界を理解し、相互作用するための基盤となる。従来の手法は高い忠実度を達成するが、それはスローシーン毎の最適化やカテゴリ固有のトレーニングによって制限されるため、実践的なデプロイメントやスケーラビリティを妨げている。そのため,近年,フィードフォワード3次元再構築が急速に進展している。画像を直接3次元表現にマッピングするモデルを1つの前方通過で学習することにより、効率的な再構成と堅牢なクロスシーンの一般化を可能にする。暗黙のフィールドから明示的なプリミティブまで多様な幾何学的出力表現にもかかわらず、既存のフィードフォワードアプローチは、画像の特徴抽出バックボーン、多視点情報融合機構、幾何学的設計原則など、同様の高度なアーキテクチャパターンを共有している。その結果、これらの表現の違いを抽象化し、代わりにモデル設計に焦点を当て、出力形式に依存しないモデル設計戦略を中心とした新しい分類法を提案する。提案する分類学は, 機能向上, 幾何学的認識, モデル効率, 拡張戦略, 時間認識モデルという, 最近の研究展開を導く5つの重要な問題に分類する。この分類学を実証的基盤化と標準化された評価で支援するため、関連するベンチマークとデータセットをさらに網羅的にレビューし、フィードフォワード3Dモデルに基づいて現実世界のアプリケーションを広範囲に議論し分類する。最後に,拡張性や評価基準,世界モデリングといったオープンな課題に対処するための今後の方向性について概説する。

論文の概要: Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

関連論文リスト