Fugu-MT 論文翻訳(概要): AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

論文の概要: AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

arxiv url: http://arxiv.org/abs/2606.03963v3
Date: Tue, 09 Jun 2026 15:09:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-10 15:40:57.931611
Title: AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation
Title（参考訳）: AgenticRL:視覚依存型UAVナビゲーションのための自己精製型エージェント強化学習
Authors: Roohan Ahmed Khan, Yasheerah Yaqoot, Amir Atef Habel, Muhammad Ahsan Mustafa, Dzmitry Tsetserukou,
Abstract要約: 本稿では,ナビゲーションタスクのためのエージェントガイド型強化学習フレームワークであるAgenticRLを提案する。 AgenticRLは、タスク情報と視覚シーンの観察を解釈するために、マルチモーダル・ジェネレーティブ・プレトレーニング・トランスフォーマー(GPT)エージェントを使用する。タスク固有の報酬関数を生成し、PPOアルゴリズムを用いてポリシーを訓練し、訓練されたポリシーを評価することで批判的な役割を果たす。このフィードバックに基づいて、エージェントは障害モードを特定し、クローズドループ自己改善プロセスにおける報酬関数を洗練する。
参考スコア（独自算出の注目度）: 2.0325612651874305
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep reinforcement learning has shown strong potential for enabling autonomous robots to learn complex navigational tasks. However, its practical use still depends heavily on human designed reward functions and repeated manual fine tuning, which is time consuming and does not guarantee high success in the desired task. This paper presents AgenticRL, agent guided reinforcement learning framework that increases autonomy in reward design, policy refinement, and real world deployment for unmanned aerial vehicles (UAV) navigation tasks. AgenticRL uses a multimodal generative pre-trained transformer (GPT) agent to interpret task information and visual scene observations, generate task specific reward functions, train policies using Proximal Policy Optimization (PPO) algorithm, and then act as a critic by evaluating the trained policy through diagnosis packets to generate feedback. Based on this feedback, the agent identifies failure modes and refines the reward function in a closed loop self improvement process. To further leverage the multimodal GPT agent during inference, AgenticRL uses real world images and natural language task information to automatically identify the active scenario and select the appropriate trained policy for execution. The framework is evaluated on multiple navigational tasks, including gate traversal, obstacle avoidance, wall barrier crossing with landing, trajectory following, and motion behavior learning. Experimental results show that the closed loop refinement process improves policy behavior compared with initial rewards by 71%. We also demonstrate sim-to-real transfer of the proposed framework, achieving a real world success rate of 91% and a sim-to-real accuracy of 94%.
Abstract（参考訳）: 深層強化学習は、自律ロボットが複雑なナビゲーションタスクを学べる強力な可能性を示している。しかし、その実用的利用は依然として人間の設計した報酬関数と繰り返し手動の微調整に大きく依存しており、これは時間を要するものであり、所望のタスクにおいて高い成功を保証していない。本稿では,無人航空機(UAV)ナビゲーションタスクにおける報酬設計,政策改善,実世界展開における自律性を高めるエージェント指導型強化学習フレームワークであるAgenticRLを提案する。 AgenticRLは、タスク情報と視覚シーンの観察を解釈し、タスク固有の報酬関数を生成し、PPOアルゴリズムを使用してトレーニングポリシーを作成し、診断パケットを介してトレーニングされたポリシーを評価して評価して、フィードバックを生成する。このフィードバックに基づいて、エージェントは障害モードを特定し、クローズドループ自己改善プロセスにおける報酬関数を洗練する。推論中にマルチモーダルGPTエージェントをさらに活用するために、AgenticRLは実世界の画像と自然言語タスク情報を使用して、アクティブシナリオを自動的に識別し、適切なトレーニングされた実行ポリシーを選択する。このフレームワークは、ゲートトラバーサル、障害物回避、着陸時の壁壁交差、軌道追従、動きの学習など、複数のナビゲーションタスクで評価される。実験結果から, 閉ループ改質プロセスは, 初期報酬よりも71%向上することがわかった。また,提案したフレームワークのシミュレートを実世界の成功率91%,シミュレート精度94%で実証した。

論文の概要: AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

関連論文リスト