Fugu-MT 論文翻訳(概要): Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games

論文の概要: Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games

arxiv url: http://arxiv.org/abs/2506.02849v2
Date: Mon, 15 Sep 2025 14:29:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 15:23:16.202555
Title: Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games
Title（参考訳）: パースーツ・エベイションゲームにおけるアジャイル四足歩行者のための学習されたコントローラ
Authors: Alejandro Sanchez Roncero, Yixi Cai, Olov Andersson, Petter Ogren,
Abstract要約: 我々は,アジャイル1v1クアッドロータ追従回避の問題に対処する。これらの問題に対処するために,非同期多段階人口ベース (AMSPB) アルゴリズムを提案する。このフレームワークでは、ベロシティコマンドまたはボディーレートを集合推力で出力するニューラルネットワークコントローラをトレーニングします。
参考スコア（独自算出の注目度）: 42.74003740156243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the problem of agile 1v1 quadrotor pursuit-evasion, where a pursuer and an evader learn to outmaneuver each other through reinforcement learning (RL). Such settings face two major challenges: non-stationarity, since each agent's evolving policy alters the environment dynamics and destabilizes training, and catastrophic forgetting, where a policy overfits to the current adversary and loses effectiveness against previously encountered strategies. To tackle these issues, we propose an Asynchronous Multi-Stage Population-Based (AMSPB) algorithm. At each stage, the pursuer and evader are trained asynchronously against a frozen pool of opponents sampled from a growing population of past and current policies, stabilizing training and ensuring exposure to diverse behaviors. Within this framework, we train neural network controllers that output either velocity commands or body rates with collective thrust. Experiments in a high-fidelity simulator show that: (i) AMSPB-trained RL policies outperform RL and geometric baselines; (ii) body-rate-and-thrust controllers achieve more agile flight than velocity-based controllers, leading to better pursuit-evasion performance; (iii) AMSPB yields stable, monotonic gains across stages; and (iv) trained policies in one arena size generalize fairly well to other sizes without retraining.
Abstract（参考訳）: 本稿では,アジャイル1v1クアッドロータ追従回避の問題に対処する。そこでは,追従者と回避者が強化学習(RL)を通じて互いに圧倒することを学習する。非定常性(non-stationarity)とは、各エージェントの進化するポリシーが環境のダイナミクスを変え、トレーニングを不安定にするものである。これらの問題に対処するために,非同期多段階人口ベース (AMSPB) アルゴリズムを提案する。各段階において、追跡者と回避者は、過去の人口の増加と現在の政策から採取された凍った相手のプールに対して非同期に訓練され、トレーニングの安定化と多様な行動への露出を保証する。このフレームワークでは、ベロシティコマンドまたはボディーレートを集合推力で出力するニューラルネットワークコントローラをトレーニングします。高忠実度シミュレータの実験では、こう示されています。 i) AMSPB訓練RLポリシーは、RLおよび幾何学的ベースラインを上回っている。 (II)ボディーレート・アンド・スラスト制御器はベロシティベースの制御器よりもアジャイルな飛行を実現し、追従回避性能が向上する。三)AMSPBは、段階にわたって安定で単調な利得を得る。 (4)1つのアリーナサイズで訓練されたポリシーは、再訓練せずに他のサイズにかなりよく一般化する。

論文の概要: Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games

関連論文リスト