Fugu-MT 論文翻訳(概要): OpenSGA: Efficient 3D Scene Graph Alignment in the Open World

論文の概要: OpenSGA: Efficient 3D Scene Graph Alignment in the Open World

arxiv url: http://arxiv.org/abs/2605.10484v1
Date: Mon, 11 May 2026 12:44:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.81685
Title: OpenSGA: Efficient 3D Scene Graph Alignment in the Open World
Title（参考訳）: OpenSGA: オープンワールドにおける効率的な3Dシーングラフアライメント
Authors: Gang Chen, Sebastián Barbas Laina, Stefan Leutenegger, Javier Alonso-Mora,
Abstract要約: シーングラフアライメントは、部分的に重なり合う観察から構築された2つの3次元シーングラフ間のオブジェクト対応を確立する。既存のアプローチは主にサブスキャン・ツー・サブスキャン(S2S)アライメントに焦点を当てており、幾何点雲の特徴に大きく依存している。視覚言語,テキスト,幾何学的特徴を空間的コンテキストで融合することにより,オブジェクトの対応性を予測する,統一的で効率的なシーングラフアライメントフレームワークを提案する。
参考スコア（独自算出の注目度）: 27.9502908270849
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene graph alignment establishes object correspondences between two 3D scene graphs constructed from partially overlapping observations. This enables efficient scene understanding and object-level relocalization when a robot revisits a place, as well as global map fusion across multiple agents. Such capabilities are essential for robots that require long-term memory for long-horizon tasks involving interactions with the environment. Existing approaches mainly focus on subscan-to-subscan (S2S) alignment and depend heavily on geometric point-cloud features, leaving frame-to-scan (F2S) alignment and open-set vision-language features underexplored. In addition, existing datasets for scene graph alignment remain small-scale with limited object diversity, constraining systematic training and evaluation. We present a unified and efficient scene graph alignment framework that predicts object correspondences by fusing vision-language, textual, and geometric features with spatial context. The framework comprises modules such as a distance-gated spatial attention encoder, a minimum-cost-flow-based allocator, and a global scene embedding generator to achieve accurate alignment even under large coordinate discrepancies. We further introduce ScanNet-SG, a large-scale dataset generated via an automated annotation pipeline with over 700k samples, covering 509 object categories from ScanNet labels and over 3k categories from GPT-4o-based tagging. Experiments show that our method achieves the best overall performance on both F2S and S2S tasks, substantially outperforming existing scene graph alignment methods. Our code and dataset are released at: https://autonomousrobots.nl/paper_websites/opensga.
Abstract（参考訳）: シーングラフアライメントは、部分的に重なり合う観察から構築された2つの3次元シーングラフ間のオブジェクト対応を確立する。これにより、ロボットが場所を再考する際の効率的なシーン理解とオブジェクトレベルの再ローカライズと、複数のエージェントをまたいだグローバルマップの融合が可能になる。このような能力は、環境との相互作用を含む長期タスクのために長期記憶を必要とするロボットにとって不可欠である。既存のアプローチは主にサブスキャン・ツー・サブスキャン(S2S)のアライメントに重点を置いており、幾何点雲の特徴に大きく依存しており、フレーム・ツー・スキャン(F2S)のアライメントとオープンセットの視覚言語機能はまだ探索されていない。さらに、シーングラフアライメントのための既存のデータセットは、オブジェクトの多様性を制限し、体系的なトレーニングと評価を制限しながら、小規模のままである。視覚言語,テキスト,幾何学的特徴を空間的コンテキストで融合することにより,オブジェクトの対応性を予測する,統一的で効率的なシーングラフアライメントフレームワークを提案する。フレームワークは、距離ゲート空間注目エンコーダ、最小コストフローベースのアロケータ、及び大域的なシーン埋め込みジェネレータなどのモジュールから構成され、大きな座標不一致の下でも正確なアライメントを実現する。 ScanNet-SGはまた、700k以上のサンプルを持つ自動アノテーションパイプラインを通じて生成された大規模データセットであるScanNet-SGを紹介し、ScanNetラベルから509のオブジェクトカテゴリ、GPT-4oベースのタグ付けから3k以上のカテゴリをカバーしている。実験により,本手法はF2SタスクとS2Sタスクの両方において最高の総合的な性能を達成でき,既存のシーングラフアライメント手法よりも大幅に優れていることがわかった。私たちのコードとデータセットは、https://autonomousrobots.nl/paper_websites/opensga.comでリリースされています。

論文の概要: OpenSGA: Efficient 3D Scene Graph Alignment in the Open World

関連論文リスト