Fugu-MT 論文翻訳(概要): Uncertainty-Aware Gaussian Map for Vision-Language Navigation

論文の概要: Uncertainty-Aware Gaussian Map for Vision-Language Navigation

arxiv url: http://arxiv.org/abs/2605.26503v1
Date: Tue, 26 May 2026 03:33:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.60548
Title: Uncertainty-Aware Gaussian Map for Vision-Language Navigation
Title（参考訳）: 視線ナビゲーションのための不確かさを意識したガウスマップ
Authors: Jianzhe Gao, Rui Liu, Yuxuan Xu, Tongtong Cao, Yingxue Zhang, Zhanguang Zhang, Sida Peng, Yi Yang, Wenguan Wang,
Abstract要約: Vision-Language Navigation (VLN)は、自然言語の指示に従って3D環境をナビゲートするエージェントを必要とする。本研究では,3種類の知覚的不確実性(幾何学的,意味的,外見的不確実性)を明示的にモデル化し,エージェントの観察空間に統合し,情報的意思決定を可能にする。
参考スコア（独自算出の注目度）: 63.97713877754199
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language Navigation (VLN) requires an agent to navigate 3D environments following natural language instructions. During navigation, existing agents commonly encounter perceptual uncertainty, such as insufficient evidence for reliable grounding or ambiguity in interpreting spatial cues, yet they typically ignore such information when predicting actions. In this work, we explicitly model three forms of perceptual uncertainty (i.e., geometric, semantic, and appearance uncertainty) and integrate them into the agent's observation space to enable informed decision-making. Concretely, our agent first constructs a Semantic Gaussian Map (SGM), composed of differentiable 3D Gaussian primitives initialized from panoramic observations, that encodes both the geometric structure and semantic content of the environment. On top of SGM, geometric uncertainty is estimated through variational perturbations of Gaussian position and scale to assess structural reliability; semantic uncertainty is captured by perturbing Gaussian semantic attributes to reveal ambiguous interpretations; and appearance uncertainty is characterized by Fisher Information, which measures the sensitivity of rendered observations to Gaussian-level variations. These uncertainties are incorporated into SGM, extending it into a unified 3D Value Map, which grounds them as affordances and constraints that support reliable navigation. Comprehensive evaluations across multiple VLN benchmarks show the effectiveness of our agent.
Abstract（参考訳）: Vision-Language Navigation (VLN)は、自然言語の指示に従って3D環境をナビゲートするエージェントを必要とする。ナビゲーション中、既存のエージェントは、空間的な手がかりを解釈する上で、信頼できる根拠の不十分な証拠や曖昧さなどの知覚的不確実性に遭遇するが、アクションを予測する際には通常そのような情報を無視する。本研究では,3種類の知覚的不確実性(幾何学的,意味的,外見的不確実性)を明示的にモデル化し,エージェントの観察空間に統合し,情報的意思決定を可能にする。具体的には、まず、パノラマ観測から初期化された微分可能な3次元ガウスプリミティブからなるセマンティックガウスマップ(SGM)を構築し、環境の幾何学的構造と意味的内容の両方を符号化する。 SGM上では, 幾何的不確実性は, ガウス位置の変動摂動と構造的信頼性を評価する尺度によって推定され, 意味的不確実性はガウス意味属性を摂動して不明瞭な解釈を明らかにすることによって捉えられ, 出現不確実性はガウスレベルの変動に対する反射観測の感度を測定するFisher Informationによって特徴づけられる。これらの不確実性はSGMに組み込まれ、それらを統合された3Dバリューマップに拡張することで、信頼性の高いナビゲーションをサポートする余裕と制約として利用することができる。複数のVLNベンチマークの総合的な評価は, エージェントの有効性を示す。

論文の概要: Uncertainty-Aware Gaussian Map for Vision-Language Navigation

関連論文リスト