Fugu-MT 論文翻訳(概要): Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System

論文の概要: Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System

arxiv url: http://arxiv.org/abs/2606.18112v2
Date: Thu, 18 Jun 2026 16:31:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 16:09:18.994195
Title: Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System
Title（参考訳）: Qwen-RobotNavテクニカルレポート:エージェントナビゲーションシステム用に設計されたスケーラブルナビゲーションモデル
Authors: Jiazhao Zhang, Gengze Zhou, Hale Yin, Yiyang Huang, Zixing Lei, Qihang Peng, Haoqi Yuan, Jie Zhang, Xudong Guo, Xiaoyue Chen, An Yang, Fei Huang, Zhibo Yang, Junyang Lin, Dayiheng Liu, Jingren Zhou, Zhuoyuan Yu, Jingyang Fan, Zhixuan Liang, Pei Lin, Ye Wang, Anzhe Chen, Kun Yan, Xiao Xu, Jiahao Li, Lulu Hu, Minying Zhang, Shurui Li, Wenhu Xiao, Shuai Bai, Xuancheng Ren, Chenxu Lv, Chenfei Wu, Xiong-Hui Chen,
Abstract要約: Qwen-RobotNavは、Qwen-RobotNav上に構築されたスケーラブルなナビゲーションモデルである。 156万のサンプルでQwen-RobotNavをトレーニングします。 Qwen-RobotNavは、主要なナビゲーションベンチマークにまたがって、最先端の結果を新たに設定する。
参考スコア（独自算出の注目度）: 96.69286664036143
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic navigation systems require a base navigation model whose observation strategy can be externally reconfigured at inference time, because instruction following, object search, target tracking, and autonomous driving share the same perception-planning backbone yet demand fundamentally different strategies for consuming the visual stream. We present Qwen-RobotNav, a scalable navigation model built on Qwen-RobotNav that addresses it through a parameterised interface with two complementary dimensions: multiple task modes that select the navigation behaviour, and controllable observation parameters (e.g., token budget, per-camera weights) that govern how visual history is encoded. With training-time randomization over all parameters, Qwen-RobotNav is robust to any inference-time configuration requiring zero architectural modification to the Qwen-RobotNav backbone. We train Qwen-RobotNav on 15.6M samples; co-training with vision-language data prevents the collapse into reactive action-sequence mappers observed in trajectory-only training. The parameterised interface also makes Qwen-RobotNav a natural building block for agentic systems: for long-horizon scenarios, an upper-level planner decomposes goals into sub-tasks and dynamically switches Qwen-RobotNav's task mode and context strategy mid-episode, composing complex behaviours from repeated calls to the same model. Extensive experiments show that Qwen-RobotNav sets new state-of-the-art results across major navigation benchmarks. The model exhibits favourable scaling from 2B to 8B parameters, with joint multi-task training developing a shared spatial-planning substrate that transfers across task families, and demonstrates strong zero-shot generalisation to real-world robots across diverse environments.
Abstract（参考訳）: エージェントナビゲーションシステムは、指示追従、対象探索、目標追従、自律運転が同じ知覚計画のバックボーンを共有しているため、推論時に観察戦略を外部に再構成できるベースナビゲーションモデルを必要とするが、ビジュアルストリームを消費するためには根本的に異なる戦略を必要とする。 Qwen-RobotNavは、Qwen-RobotNav上に構築されたスケーラブルなナビゲーションモデルで、ナビゲーション動作を選択する複数のタスクモードと、視覚的履歴のエンコード方法を管理する制御可能な観察パラメータ(トークン予算、カメラ単位の重みなど)の2つの相補的な次元を持つパラメータ化インターフェースを介してそれを処理する。すべてのパラメータに対するトレーニング時間ランダム化により、Qwen-RobotNavは、Qwen-RobotNavのバックボーンにアーキテクチャ変更を一切必要としない推論時間構成に対して堅牢である。我々はQwen-RobotNavを15.6Mサンプルでトレーニングし、視覚言語データと共同トレーニングすることで、軌道のみのトレーニングで観察される反応性アクションシーケンスマッパーへの崩壊を防ぐ。長期のシナリオでは、上位のプランナーが目標をサブタスクに分解し、Qwen-RobotNavのタスクモードとコンテキスト戦略を動的に切り替え、同じモデルへの繰り返し呼び出しから複雑な振る舞いを構成する。大規模な実験により、Qwen-RobotNavは主要なナビゲーションベンチマークにまたがって、新しい最先端の結果を設定している。このモデルは、2Bから8Bパラメータのスケーリングに好適であり、タスクファミリ間で移動する共有空間計画基板を共同で開発し、多様な環境にまたがる現実世界のロボットに強力なゼロショット一般化を示す。

論文の概要: Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System

関連論文リスト