Fugu-MT 論文翻訳(概要): $A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

論文の概要: $A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

arxiv url: http://arxiv.org/abs/2308.07997v1
Date: Tue, 15 Aug 2023 19:01:19 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-17 15:42:14.308124
Title: $A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models
Title（参考訳）: A^2$Nav:基礎モデルの視覚・言語能力の爆発によるアクション対応ゼロショットロボットナビゲーション
Authors: Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan
Abstract要約: 本研究では,ゼロショット視覚言語ナビゲーション(ZS-VLN)の課題について検討する。通常、命令は複雑な文法構造を持ち、しばしば様々な行動記述を含む。これらのアクション要求を正しく理解し実行する方法は重要な問題であり、アノテーション付きデータがないため、さらに困難になる。
参考スコア（独自算出の注目度）: 89.64729024399634
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet challenging problem in which an agent learns to navigate following a path described by language instructions without requiring any path-instruction annotation data. Normally, the instructions have complex grammatical structures and often contain various action descriptions (e.g., "proceed beyond", "depart from"). How to correctly understand and execute these action demands is a critical problem, and the absence of annotated data makes it even more challenging. Note that a well-educated human being can easily understand path instructions without the need for any special training. In this paper, we propose an action-aware zero-shot VLN method ($A^2$Nav) by exploiting the vision-and-language ability of foundation models. Specifically, the proposed method consists of an instruction parser and an action-aware navigation policy. The instruction parser utilizes the advanced reasoning ability of large language models (e.g., GPT-3) to decompose complex navigation instructions into a sequence of action-specific object navigation sub-tasks. Each sub-task requires the agent to localize the object and navigate to a specific goal position according to the associated action demand. To accomplish these sub-tasks, an action-aware navigation policy is learned from freely collected action-specific datasets that reveal distinct characteristics of each action demand. We use the learned navigation policy for executing sub-tasks sequentially to follow the navigation instruction. Extensive experiments show $A^2$Nav achieves promising ZS-VLN performance and even surpasses the supervised learning methods on R2R-Habitat and RxR-Habitat datasets.
Abstract（参考訳）: 本研究では,ゼロショット視覚言語ナビゲーション(zs-vln, zero-shot vision-and-language navigation)の課題について検討する。通常、命令は複雑な文法構造を持ち、様々な動作記述(例えば "proceed beyond" や "depart from" など)を含んでいる。これらのアクション要求を正しく理解し実行する方法は重要な問題であり、アノテーション付きデータがないため、さらに困難になる。優れた教育を受けた人間は、特別な訓練を必要とせずに、容易に経路指示を理解できる。本稿では,基礎モデルの視覚・言語能力を利用したアクション対応ゼロショットVLN法(A^2$Nav)を提案する。具体的には,提案手法は命令パーサとアクション対応ナビゲーションポリシから構成される。命令パーサは、大規模な言語モデル(例えばGPT-3)の高度な推論能力を利用して、複雑なナビゲーション命令をアクション固有のオブジェクトナビゲーションサブタスクのシーケンスに分解する。各サブタスクは、エージェントがオブジェクトをローカライズし、関連するアクション要求に応じて特定のゴール位置にナビゲートする必要がある。これらのサブタスクを達成するために、アクション対応ナビゲーションポリシーは、アクション要求ごとに異なる特性を示す、自由に収集されたアクション固有のデータセットから学習される。学習したナビゲーションポリシーを用いて、サブタスクを逐次実行し、ナビゲーション命令に従う。大規模な実験によると、$A^2$NavはZS-VLNのパフォーマンスを期待でき、R2R-HabitatデータセットとRxR-Habitatデータセットの教師付き学習方法を超えている。

関連論文リスト

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM [55.79954652783797]
VLN(Vision-and-Language Navigation)は、エージェントを具現化するための重要なスキルであり、自然言語の指示に従って3D環境をナビゲートすることができる。従来の方法では、トラジェクトリ動画をステップバイステップでデータ拡張の指示に変換するが、そのような指示はユーザの通信スタイルとうまく一致しない。本稿では,VLNのユーザ要求命令を生成する検索拡張生成フレームワークであるNavRAGを提案する。
論文参考訳（メタデータ） (2025-02-16T14:17:36Z)
InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment [5.43847693345519]
本研究では,汎用的な命令ナビゲーションシステムであるInstructNavを提案する。 InstructNavは、ナビゲーショントレーニングやビルド済みのマップを使わずに、さまざまな命令ナビゲーションタスクを最初に処理する。 InstructNavでは、R2R-CEタスクを初めてゼロショットで完了し、多くのタスク学習方法より優れています。
論文参考訳（メタデータ） (2024-06-07T12:26:34Z)
Lana: A Language-Capable Navigator for Instruction Following and Generation [70.76686546473994]
LANAは言語対応のナビゲーションエージェントで、人書きのナビゲーションコマンドを実行し、人へのルート記述を提供することができる。我々は、最近の高度なタスク固有解と比較して、LANAが命令追従と経路記述の両方においてより良い性能を発揮することを実証的に検証した。加えて、言語生成能力が与えられたLANAは、人間の行動を説明し、人間のウェイフィンディングを支援することができる。
論文参考訳（メタデータ） (2023-03-15T07:21:28Z)
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action [76.71101507291473]
本稿では,無注釈の大規模軌跡データに対するトレーニングの恩恵を享受するロボットナビゲーションシステムLM-Navを提案する。本研究では,ナビゲーション(ViNG),画像言語アソシエーション(CLIP),言語モデリング(GPT-3)の事前学習モデルから構築可能なシステムについて述べる。
論文参考訳（メタデータ） (2022-07-10T10:41:50Z)
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation [172.15808300686584]
本稿では,2つのタスクを同時に学習し,それぞれのトレーニングを促進するために本質的な相関性を利用するアプローチについて述べる。提案手法は,様々な追従モデルの性能を改善し,正確なナビゲーション命令を生成する。
論文参考訳（メタデータ） (2022-03-30T18:15:26Z)
Contrastive Instruction-Trajectory Learning for Vision-Language Navigation [66.16980504844233]
視覚言語ナビゲーション(VLN)タスクでは、エージェントが自然言語の指示でターゲットに到達する必要がある。先行研究は、命令-軌道対間の類似点と相違点を識別できず、サブ命令の時間的連続性を無視する。本稿では、類似したデータサンプル間の分散と、異なるデータサンプル間の分散を探索し、ロバストなナビゲーションのための独特な表現を学習するContrastive Instruction-Trajectory Learningフレームワークを提案する。
論文参考訳（メタデータ） (2021-12-08T06:32:52Z)
Visual-and-Language Navigation: A Survey and Taxonomy [1.0742675209112622]
本稿では,ビジュアル・アンド・ランゲージ・ナビゲーション(VLN)タスクに関する総合的な調査を行う。命令が与えられたら、タスクはシングルターンとマルチターンに分けられる。この分類学は、研究者が特定のタスクの要点をよりよく把握し、将来の研究の方向性を特定することを可能にする。
論文参考訳（メタデータ） (2021-08-26T01:51:18Z)
Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation [145.84123197129298]
自然言語に基づくナビゲーションタスクでは,言語指導が重要な役割を担っている。より堅牢なナビゲータを訓練し、長い指導から重要な要素を動的に抽出する。具体的には,航法士が間違った目標に移動することを誤認することを学習する動的強化命令攻撃装置(DR-Attacker)を提案する。
論文参考訳（メタデータ） (2021-07-23T14:11:31Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。