Fugu-MT 論文翻訳(概要): RAPTOR: A Foundation Policy for Quadrotor Control

論文の概要: RAPTOR: A Foundation Policy for Quadrotor Control

arxiv url: http://arxiv.org/abs/2509.11481v1
Date: Mon, 15 Sep 2025 00:05:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:23.105029
Title: RAPTOR: A Foundation Policy for Quadrotor Control
Title（参考訳）: RAPTOR: クアドロター制御の基礎方針
Authors: Jonas Eschmann, Dario Albani, Giuseppe Loianno,
Abstract要約: 人間は、新しい車を運転するなど、目に見えない新しい状況に適応するとき、驚くほどデータ効率が良い。強化学習(Reinforcement Learning)を使用してトレーニングされたニューラルネットワークポリシのような、現代のロボット制御システムは、単一の環境に非常に特化している。本稿では,四元数制御のための高度適応的基本方針の学習方法であるRAPTORについて述べる。
参考スコア（独自算出の注目度）: 7.1760769144571865
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humans are remarkably data-efficient when adapting to new unseen conditions, like driving a new car. In contrast, modern robotic control systems, like neural network policies trained using Reinforcement Learning (RL), are highly specialized for single environments. Because of this overfitting, they are known to break down even under small differences like the Simulation-to-Reality (Sim2Real) gap and require system identification and retraining for even minimal changes to the system. In this work, we present RAPTOR, a method for training a highly adaptive foundation policy for quadrotor control. Our method enables training a single, end-to-end neural-network policy to control a wide variety of quadrotors. We test 10 different real quadrotors from 32 g to 2.4 kg that also differ in motor type (brushed vs. brushless), frame type (soft vs. rigid), propeller type (2/3/4-blade), and flight controller (PX4/Betaflight/Crazyflie/M5StampFly). We find that a tiny, three-layer policy with only 2084 parameters is sufficient for zero-shot adaptation to a wide variety of platforms. The adaptation through In-Context Learning is made possible by using a recurrence in the hidden layer. The policy is trained through a novel Meta-Imitation Learning algorithm, where we sample 1000 quadrotors and train a teacher policy for each of them using Reinforcement Learning. Subsequently, the 1000 teachers are distilled into a single, adaptive student policy. We find that within milliseconds, the resulting foundation policy adapts zero-shot to unseen quadrotors. We extensively test the capabilities of the foundation policy under numerous conditions (trajectory tracking, indoor/outdoor, wind disturbance, poking, different propellers).
Abstract（参考訳）: 人間は、新しい車を運転するなど、目に見えない新しい状況に適応するとき、驚くほどデータ効率が良い。対照的に、強化学習(RL)を使用してトレーニングされたニューラルネットワークポリシのような現代のロボット制御システムは、単一の環境に非常に専門的である。このような過度な適合のため、シミュレーション・トゥ・リアル(Simmo-to-Real)ギャップのような小さな違いがあっても、システムを識別し、システムに最小限の変更を加える必要があることが知られている。本研究では,四元数制御のための高度適応的基本方針の学習方法であるRAPTORを提案する。本手法は,多種多様な四辺形を制御するために,単一エンドツーエンドのニューラル・ネットワーク・ポリシーを訓練することができる。我々は、32gから2.4kgまでの10種類の実四極子を、同様にモータータイプ(ブラシレス対ブラシレス)、フレームタイプ(ソフト対剛性)、プロペラタイプ(2/3/4ブレード)、フライトコントローラ(PX4/Betaflight/Crazyflie/M5StampFly)でテストした。 2084パラメータしか持たない小さな3層ポリシーは、様々なプラットフォームへのゼロショット適応に十分である。 In-Context Learningによる適応は、隠れたレイヤの繰り返しを使用することで実現される。政策は、新しいメタイミテーション学習アルゴリズムによって訓練され、1000の四分儀をサンプリングし、強化学習を用いて教師の政策を訓練する。その後、1000人の教師が1つの適応的な学生政策に蒸留される。数ミリ秒以内に、結果として生じる基本方針がゼロショットを目に見えない四辺形に適応させることが分かる。各種の条件(軌道追跡,屋内/屋外,風乱,ポーキング,異なるプロペラ)で基礎政策の機能を広範囲に検証した。

論文の概要: RAPTOR: A Foundation Policy for Quadrotor Control

関連論文リスト