Fugu-MT 論文翻訳(概要): FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

論文の概要: FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

arxiv url: http://arxiv.org/abs/2605.03669v1
Date: Tue, 05 May 2026 12:08:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.921669
Title: FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers
Title（参考訳）: FUS3DMaps: Voxel- and Instance-Level Layersの3次元融合による拡張性と精度の高いオープンボキャブラリセマンティックマッピング
Authors: Timon Homberger, Finn Lukas Busch, Jesús Gerardo Ortega Peimbert, Quantao Yang, Olov Andersson,
Abstract要約: FUS3DMapsは、共有ボクセルマップ内の密度層とインスタンスレベルのオープン語彙層の両方を保持するオンラインの2層セマンティックマッピング手法である。提案したセマンティック・クロス層融合手法は, インスタンスレベルと高密度層の両方の品質を向上させる。確立された3次元セマンティックセグメンテーションベンチマークおよび大規模シーンの選択実験により、FUS3DMapsは複数階のビルディングスケールで正確なオープン語彙セマンティックマッピングを実現することが示された。
参考スコア（独自算出の注目度）: 2.610405478993863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open-vocabulary semantic mapping enables robots to spatially ground previously unseen concepts without requiring predefined class sets. Current training-free methods commonly rely on multi-view fusion of semantic embeddings into a 3D map, either at the instance-level via segmenting views and encoding image crops of segments, or by projecting image patch embeddings directly into a dense semantic map. The latter approach sidesteps segmentation and 2D-to-3D instance association by operating on full uncropped image frames, but existing methods remain limited in scalability. We present FUS3DMaps, an online dual-layer semantic mapping method that jointly maintains both dense and instance-level open-vocabulary layers within a shared voxel map. This design enables further voxel-level semantic fusion of the layer embeddings, combining the complementary strengths of both semantic mapping approaches. We find that our proposed semantic cross-layer fusion approach improves the quality of both the instance-level and dense layers, while also enabling a scalable and highly accurate instance-level map where the dense layer and cross-layer fusion are restricted to a spatial sliding window. Experiments on established 3D semantic segmentation benchmarks as well as a selection of large-scale scenes show that FUS3DMaps achieves accurate open-vocabulary semantic mapping at multi-story building scales. Additional material and code will be made available: https://githanonymous.github.io/FUS3DMaps/.
Abstract（参考訳）: オープン・ボキャブラリ・セマンティック・マッピングにより、事前に定義されたクラスセットを必要とせずに、ロボットが空間的に未確認の概念をグラウンド化することができる。現在のトレーニングフリーな手法は、一般的に3Dマップへのセマンティック埋め込みのマルチビュー融合に依存しており、セグメンテーションビューやセグメントのイメージキュリーのエンコード、あるいはイメージパッチの埋め込みを直接密なセマンティックマップに投影することでインスタンスレベルでの3Dマップへの統合に依存している。後者のアプローチは、完全に切り離された画像フレームを操作することによって、セグメンテーションと2D-to-3Dインスタンスアソシエーションをサイドステップするが、既存の手法は拡張性に制限がある。 FUS3DMapsは、共有ボクセルマップ内において、密度層とインスタンスレベルのオープン語彙層の両方を共同で維持するオンライン2層セマンティックマッピング手法である。この設計により、層埋め込みのボクセルレベルのセマンティック融合が可能になり、両方のセマンティックマッピングアプローチの相補的な強みを組み合わせることができる。提案したセマンティック・クロス・フュージョン・アプローチは,高密度・高密度・高密度の両層の品質を向上させるとともに,高密度・高高精度のインスタンス・レベル・マップを空間的スライディング・ウインドウに制限することを可能にする。確立された3次元セマンティックセグメンテーションベンチマークおよび大規模シーンの選択実験により、FUS3DMapsは複数階のビルディングスケールで正確なオープン語彙セマンティックマッピングを実現することが示された。追加の資料とコードが提供される。 https://githanonymous.github.io/FUS3DMaps/。

論文の概要: FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

関連論文リスト