Fugu-MT 論文翻訳(概要): First Shape, Then Meaning: Efficient Geometry and Semantics Learning for Indoor Reconstruction

論文の概要: First Shape, Then Meaning: Efficient Geometry and Semantics Learning for Indoor Reconstruction

arxiv url: http://arxiv.org/abs/2605.03463v1
Date: Tue, 05 May 2026 07:50:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.826039
Title: First Shape, Then Meaning: Efficient Geometry and Semantics Learning for Indoor Reconstruction
Title（参考訳）: 初形状と意味:室内再建のための効率的な幾何学と意味学習
Authors: Remi Chierchia, Léo Lebrat, David Ahmedt-Aristizabal, Olivier Salvado, Clinton Fookes, Rodrigo Santa Cruz,
Abstract要約: FSTMは2段階のプロセスを通して幾何学と意味学を学習するための統一的なアプローチである。合成および実世界の屋内データセットを用いた実験により,本手法がマルチSDF手法より優れていることが示された。
参考スコア（独自算出の注目度）: 23.174056594526494
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural Surface Reconstruction has become a standard methodology for indoor 3D reconstruction, with Signed Distance Functions (SDFs) proving particularly effective for representing scene geometry. A variety of applications require a detailed understanding of the scene context, driving the need for object-level semantic signals. While recent methods successfully integrate semantic labels, they often inherit the slow training time and limited scalability of multi-SDF learning. In this paper, we introduce FSTM, a unified approach for learning geometry and semantics through a two-step process: a geometry warm-up using RGB inputs and geometric cues, followed by semantic field estimation. By first optimising geometry without semantic supervision, we observe substantial improvements compared to the standard joint optimisation. Rather than relying on specialised modules or complex multi-SDF designs, FSTM shows that a streamlined formulation is sufficient to achieve strong geometric and semantic reconstructions. Experiments on both synthetic and real-world indoor datasets show that our method outperforms multi-SDF approaches. It trains 2.3x faster on Replica, improves robustness to real-world imperfections on ScanNet++, and achieves higher recall by recovering the surfaces of more objects in the scene. The code will be made available at https://remichierchia.github.io/FSTM.
Abstract（参考訳）: ニューラル・サーフェス・リコンストラクションは屋内3次元再構成の標準手法となり、SDF(Signed Distance Function)は特にシーン幾何学の表現に有効であることが証明されている。さまざまなアプリケーションがシーンコンテキストの詳細な理解を必要としており、オブジェクトレベルのセマンティックな信号を必要としている。最近の手法はセマンティックラベルをうまく統合するが、遅いトレーニング時間とマルチSDF学習の限られたスケーラビリティを継承することが多い。本稿では,RGB入力と幾何学的手がかりを用いた幾何ウォームアップとそれに続くセマンティックフィールド推定という,2段階のプロセスを通じて幾何学と意味学を学習するための統一的なアプローチであるFSTMを紹介する。まず、意味的な監督なしに幾何を最適化することにより、標準的な関節の最適化と比較して大幅に改善される。 FSTMは、特別なモジュールや複雑なマルチSDF設計に頼るのではなく、合理化された定式化が強力な幾何学的および意味的再構成を実現するのに十分であることを示す。合成および実世界の屋内データセットを用いた実験により,本手法がマルチSDF手法より優れていることが示された。 Replicaで2.3倍高速にトレーニングし、ScanNet++の現実世界の欠陥に対する堅牢性を改善し、シーン内のより多くのオブジェクトの表面を復元することで高いリコールを実現する。コードはhttps://remichierchia.github.io/FSTMで公開される。

論文の概要: First Shape, Then Meaning: Efficient Geometry and Semantics Learning for Indoor Reconstruction

関連論文リスト