Fugu-MT 論文翻訳(概要): Hydra-Nav: Object Navigation via Adaptive Dual-Process Reasoning

論文の概要: Hydra-Nav: Object Navigation via Adaptive Dual-Process Reasoning

arxiv url: http://arxiv.org/abs/2602.09972v1
Date: Tue, 10 Feb 2026 17:00:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.330608
Title: Hydra-Nav: Object Navigation via Adaptive Dual-Process Reasoning
Title（参考訳）: Hydra-Nav:Adaptive Dual-Process Reasoningによるオブジェクトナビゲーション
Authors: Zixuan Wang, Huang Fang, Shaoan Wang, Yuanfei Luo, Heng Dong, Wei Li, Yiming Gan,
Abstract要約: 探索履歴を解析し,高レベルプランを定式化するための検討段階の遅いシステム間を切り替える統合VLMアーキテクチャであるHydra-Navを紹介する。実験の結果、Hydra-NavはHM3D、MP3D、OVONのベンチマークで最先端のパフォーマンスを達成し、それぞれ11.1%、17.4%、21.2%で2番目に良い手法を上回った。
参考スコア（独自算出の注目度）: 27.764007225331454
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While large vision-language models (VLMs) show promise for object goal navigation, current methods still struggle with low success rates and inefficient localization of unseen objects--failures primarily attributed to weak temporal-spatial reasoning. Meanwhile, recent attempts to inject reasoning into VLM-based agents improve success rates but incur substantial computational overhead. To address both the ineffectiveness and inefficiency of existing approaches, we introduce Hydra-Nav, a unified VLM architecture that adaptively switches between a deliberative slow system for analyzing exploration history and formulating high-level plans, and a reactive fast system for efficient execution. We train Hydra-Nav through a three-stage curriculum: (i) spatial-action alignment to strengthen trajectory planning, (ii) memory-reasoning integration to enhance temporal-spatial reasoning over long-horizon exploration, and (iii) iterative rejection fine-tuning to enable selective reasoning at critical decision points. Extensive experiments demonstrate that Hydra-Nav achieves state-of-the-art performance on the HM3D, MP3D, and OVON benchmarks, outperforming the second-best methods by 11.1%, 17.4%, and 21.2%, respectively. Furthermore, we introduce SOT (Success weighted by Operation Time), a new metric to measure search efficiency across VLMs with varying reasoning intensity. Results show that adaptive reasoning significantly enhances search efficiency over fixed-frequency baselines.
Abstract（参考訳）: 大規模な視覚言語モデル(VLM)は、オブジェクトの目標ナビゲーションを約束するが、現在の手法は、成功率の低いことと、見えないオブジェクトの非効率なローカライゼーションに苦慮している。一方、VLMベースのエージェントに推論を注入する最近の試みは、成功率を改善するが、かなりの計算オーバーヘッドを発生させる。既存手法の非効率性と非効率性の両方に対処するため,Hydra-Navを導入する。Hydra-Navは,探索履歴の分析と高レベル計画の定式化のための検討段階の遅いシステムと,効率的な実行のための反応性の高速システムとを適応的に切り替える統一VLMアーキテクチャである。私たちは3段階のカリキュラムを通してHydra-Navを訓練します。一軌道計画を強化するための空間行動アライメント (II)長期探査における時間空間推論の強化のためのメモリ推論統合三批判的決定点における選択的推論を可能にするための反復的拒絶微調整大規模な実験により、Hydra-NavはHM3D、MP3D、OVONのベンチマークで最先端のパフォーマンスを達成し、それぞれ11.1%、17.4%、21.2%で2番目に良い手法を上回った。さらに,VLM間の探索効率を異なる推論強度で測定する新たな指標であるSOT(Success by Operation Time)を導入する。その結果,適応推論は固定周波数ベースラインに対する探索効率を著しく向上させることがわかった。

論文の概要: Hydra-Nav: Object Navigation via Adaptive Dual-Process Reasoning

関連論文リスト