Fugu-MT 論文翻訳(概要): Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and Reasoning

論文の概要: Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and Reasoning

arxiv url: http://arxiv.org/abs/2605.01736v1
Date: Sun, 03 May 2026 06:22:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.913866
Title: Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and Reasoning
Title（参考訳）: ゼロショットボディードナビゲーションと推論のためのマルチスケールガウス言語マップ
Authors: Sixian Zhang, Yiyao Wang, Xinhang Song, Keming Zhang, Zijian Xu, Shuqiang Jiang,
Abstract要約: 本稿では,3つのキーデザインを導入したマルチスケールガウス・ランゲージマップ(GLMap)を提案する。 3Dガウスアンは、タスク関連画像のコンパクトストレージと高速レンダリングを可能にする。 ObjectNav、InstNav、SQAタスクの実験は、GLMapがターゲットナビゲーションとコンテキスト推論を効果的に強化していることを示している。
参考スコア（独自算出の注目度）: 33.03611808441931
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the geometric and semantic structure of environments is essential for embodied navigation and reasoning. Existing semantic mapping methods trade off between explicit geometry and multi-scale semantics, and lack a native interface for large models, thus requiring additional training of feature projection for semantic alignment. To this end, we propose the multi-scale Gaussian-Language Map (GLMap), which introduces three key designs: (1) explicit geometry, (2) multi-scale semantics covering both instance and region concepts, and (3) a dual-modality interface where each semantic unit jointly stores a natural language description and a 3D Gaussian representation. The 3D Gaussians enable compact storage and fast rendering of task-relevant images via Gaussian splatting. To enable efficient incremental construction, we further propose a Gaussian Estimator that analytically derives Gaussian parameters from dense point clouds without gradient-based optimization. Experiments on ObjectNav, InstNav, and SQA tasks show that GLMap effectively enhances target navigation and contextual reasoning, while remaining compatible with large-model-based methods in a zero-shot manner. The code is available at https://github.com/sx-zhang/GLMap.
Abstract（参考訳）: 環境の幾何学的・意味的構造を理解することは、ナビゲーションと推論の具体化に不可欠である。既存のセマンティックマッピング手法は、明示的な幾何学とマルチスケールセマンティックスの間を行き来し、大きなモデルのネイティブインターフェースが欠如しているため、セマンティックアライメントのための機能プロジェクションのさらなるトレーニングが必要である。そこで我々は,(1)明示幾何学,(2)インスタンス概念と地域概念を包含するマルチスケール意味論,(3)各セマンティックユニットが自然言語記述と3次元ガウス表現を共同で格納する2次元モダリティインタフェースという,3つの重要な設計を取り入れたマルチスケールガウス・ランゲージマップ(GLMap)を提案する。 3Dガウスアンはガウススプラッティングによるタスク関連画像のコンパクトストレージと高速レンダリングを可能にする。効率的なインクリメンタルな構成を実現するために,勾配に基づく最適化を伴わずに高密度点雲からガウスパラメータを解析的に導出するガウス推定器を提案する。 ObjectNav、InstNav、SQAタスクの実験では、GLMapはターゲットナビゲーションとコンテキスト推論を効果的に強化する一方で、ゼロショット方式で大規模モデルベースのメソッドと互換性を保っている。コードはhttps://github.com/sx-zhang/GLMapで入手できる。

論文の概要: Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and Reasoning

関連論文リスト