Fugu-MT 論文翻訳(概要): Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances

論文の概要: Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances

arxiv url: http://arxiv.org/abs/2603.26612v1
Date: Fri, 27 Mar 2026 17:08:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.615579
Title: Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances
Title（参考訳）: オーバーヘッドマニピュレータ搭載UAVの変圧器による強化学習制御のためのメタ適応ビーム探索計画
Authors: Hazim Alzorgan, Sayed Pedram Haeri Boroujeni, Abolfazl Razi,
Abstract要約: オーバーヘッドマニピュレータを備えたドローンは、検査、メンテナンス、コンタクトベースのインタラクションにユニークな機能を提供する。ドローンとそのマニピュレータの動作は強く結びついており、風や制御の欠陥による小さな姿勢の変化でさえ、エンドエフェクターを意図した経路から遠ざける。変換器をベースとしたDouble Deep Q Learning (DDQN) を用いた強化学習フレームワークを開発した。これにより、コントローラは実際のモデル上でこれらのアクションを直接実行するのではなく、シミュレーションされたロールアウトを通じてエンドエフェクタの動きを予測できる。
参考スコア（独自算出の注目度）: 8.618483849755604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Drones equipped with overhead manipulators offer unique capabilities for inspection, maintenance, and contact-based interaction. However, the motion of the drone and its manipulator is tightly linked, and even small attitude changes caused by wind or control imperfections shift the end-effector away from its intended path. This coupling makes reliable tracking difficult and also limits the direct use of learning-based arm controllers that were originally designed for fixed-base robots. These effects appear consistently in our tests whenever the UAV body experiences drift or rapid attitude corrections. To address this behavior, we develop a reinforcement-learning (RL) framework with a transformer-based double deep Q learning (DDQN), with the core idea of using an adaptive beam-search planner that applies a short-horizon beam search over candidate control sequences using the learned critic as the forward estimator. This allows the controller to anticipate the end-effector's motion through simulated rollouts rather than executing those actions directly on the actual model, realizing a software-in-the-loop (SITL) approach. The lookahead relies on value estimates from a Transformer critic that processes short sequences of states, while a DDQN backbone provides the one-step targets needed to keep the learning process stable. Evaluated on a 3-DoF aerial manipulator under identical training conditions, the proposed meta-adaptive planner shows the strongest overall performance with a 10.2% reward increase, a substantial reduction in mean tracking error (from about 6% to 3%), and a 29.6% improvement in the combined reward-error metric relative to the DDQN baseline. Our method exhibits elevated stability in tracking target tip trajectory (by maintaining 5 cm tracking error) when the drone base exhibits drifts due to external disturbances, as opposed to the fixed-beam and Transformer-only variants.
Abstract（参考訳）: オーバーヘッドマニピュレータを備えたドローンは、検査、メンテナンス、コンタクトベースのインタラクションにユニークな機能を提供する。しかし、ドローンとそのマニピュレータの動きは強く結びついており、風や制御の欠陥による小さな姿勢の変化でさえ、エンドエフェクターを意図した経路から遠ざける。この結合により、信頼性の高いトラッキングが難しくなり、また、元々固定ベースロボット用に設計された学習ベースのアームコントローラの直接使用が制限される。これらの効果は、UAVの体がドリフトや急激な姿勢補正を経験するたびに、我々のテストで一貫して現れます。この振る舞いに対処するため、我々は、学習評論家をフォワード推定器として用い、候補制御列に対して短水平ビーム探索を施した適応ビーム探索プランナーを用いて、トランスフォーマーに基づく二重深度Q学習(DDQN)を用いた強化学習(RL)フレームワークを開発した。これにより、コントローラは実際のモデル上でこれらのアクションを直接実行するのではなく、シミュレーションされたロールアウトを通じてエンドエフェクタの動きを予測し、ソフトウェア・イン・ザ・ループ(SITL)アプローチを実現する。ルックアヘッドは、Transformerの批評家が短い状態列を処理するのに対して、DDQNのバックボーンは学習プロセスを安定させるために必要な1ステップの目標を提供する。同一の訓練条件下での3DoF空中マニピュレータで評価され、提案されたメタ適応プランナーは10.2%の報酬増加、平均追尾誤差の大幅な減少(約6%から3%)、DDQNベースラインと比較して29.6%の報酬-エラー測定値の改善を示す。固定ビーム・トランスフォーマーのみの変種とは対照的に, ドローン基地が外乱によるドリフトを示す場合, 目標先端軌跡追跡の安定性が向上する(追跡誤差は5cm)。

論文の概要: Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances

関連論文リスト