Fugu-MT 論文翻訳(概要): Training-Free Instance-Aware 3D Scene Reconstruction and Diffusion-Based View Synthesis from Sparse Images

論文の概要: Training-Free Instance-Aware 3D Scene Reconstruction and Diffusion-Based View Synthesis from Sparse Images

arxiv url: http://arxiv.org/abs/2603.21166v1
Date: Sun, 22 Mar 2026 10:56:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.269524
Title: Training-Free Instance-Aware 3D Scene Reconstruction and Diffusion-Based View Synthesis from Sparse Images
Title（参考訳）: スパース画像からの3次元シーン再構成と拡散に基づく画像合成
Authors: Jiatong Xia, Lingqiao Liu,
Abstract要約: 未提示のRGB画像から3次元屋内シーンを再構成・理解・レンダリングする訓練自由システムを提案する。密度の高いビューとシーンごとの最適化を必要とする従来のラディアンスフィールドアプローチとは異なり、パイプラインはトレーニングやポーズ前処理なしで高忠実度な結果が得られる。
参考スコア（独自算出の注目度）: 27.013348160823828
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a novel, training-free system for reconstructing, understanding, and rendering 3D indoor scenes from a sparse set of unposed RGB images. Unlike traditional radiance field approaches that require dense views and per-scene optimization, our pipeline achieves high-fidelity results without any training or pose preprocessing. The system integrates three key innovations: (1) A robust point cloud reconstruction module that filters unreliable geometry using a warping-based anomaly removal strategy; (2) A warping-guided 2D-to-3D instance lifting mechanism that propagates 2D segmentation masks into a consistent, instance-aware 3D representation; and (3) A novel rendering approach that projects the point cloud into new views and refines the renderings with a 3D-aware diffusion model. Our method leverages the generative power of diffusion to compensate for missing geometry and enhances realism, especially under sparse input conditions. We further demonstrate that object-level scene editing such as instance removal can be naturally supported in our pipeline by modifying only the point cloud, enabling the synthesis of consistent, edited views without retraining. Our results establish a new direction for efficient, editable 3D content generation without relying on scene-specific optimization. Project page: https://jiatongxia.github.io/TID3R/
Abstract（参考訳）: 未提示RGB画像のスパース集合から3次元屋内シーンを再構成・理解・レンダリングする新しい学習自由システムを提案する。密度の高いビューとシーンごとの最適化を必要とする従来のラディアンスフィールドアプローチとは異なり、パイプラインはトレーニングやポーズ前処理なしで高忠実度な結果が得られる。本システムは,(1)ワーピングに基づく異常除去戦略を用いて信頼性の低い幾何をフィルタリングするロバストポイントクラウド再構築モジュール,(2)2次元分割マスクを一貫したインスタンス認識3D表現に伝播するワーピング誘導2D-to-3Dインスタンスリフト機構,(3)ポイントクラウドを新たなビューに投影し,3次元認識拡散モデルでレンダリングを洗練する新しいレンダリングアプローチ,の3つの重要なイノベーションを統合する。提案手法は拡散の生成力を生かして,幾何学の欠如を補うとともに,特にスパース入力条件下でのリアリズムを向上する。さらに、インスタンス削除のようなオブジェクトレベルのシーン編集は、ポイントクラウドだけを変更することで、パイプライン内で自然にサポートできることを示し、一貫した編集されたビューを、再トレーニングせずに生成できるようにする。本結果は,シーン固有の最適化に頼ることなく,効率よく編集可能な3Dコンテンツ生成のための新たな方向性を確立する。プロジェクトページ: https://jiatongxia.github.io/TID3R/

論文の概要: Training-Free Instance-Aware 3D Scene Reconstruction and Diffusion-Based View Synthesis from Sparse Images

関連論文リスト