Fugu-MT 論文翻訳(概要): Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

論文の概要: Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

arxiv url: http://arxiv.org/abs/2606.08513v1
Date: Sun, 07 Jun 2026 08:34:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.179409
Title: Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
Title（参考訳）: 強化学習を用いた自律型水中車両のエンド・トゥ・エンド運動計画と実行に向けて
Authors: Elisei Shafer, Oren Gal,
Abstract要約: 本稿では,センサデータをスラスタコマンドに直接マッピングする,エンドツーエンドのDeep Reinforcement Learning(DRL)アプローチの実現可能性について検討する。 2Hzで動作するHigh-Level(HL)ポリシーは、生の8.4倍の84$ピクセルの単眼カメラフレーム、100倍の100$ピクセルの前方画像ソナー、空間的なサブゴールを生成するプロプリオセプティブデータを処理する。 10Hzで動作する低レベル(LL)ポリシーは、これらのサブゴールをスラスタコマンドに変換する。
参考スコア（独自算出の注目度）: 2.1942030377331245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous Underwater Vehicles (AUVs) traditionally rely on complex, heavily engineered pipelines for perception, path planning, and motion control. This paper explores the feasibility of an end-to-end Deep Reinforcement Learning (DRL) approach that maps raw sensor data directly to thruster commands, reducing manual engineering. We propose a hierarchical reinforcement learning (HRL) architecture splitting the problem into two Markov Decision Processes. A High-Level (HL) policy operating at 2Hz processes raw $84 \times 84$ pixel monocular camera frames, stacked $100 \times 100$ pixel forward-looking imaging sonar, and proprioceptive data to generate spatial subgoals. Simultaneously, a Low-Level (LL) policy operating at 10Hz converts these subgoals into thruster commands. The HL policy is trained using Reinforcement Learning from Prior Demonstrations (RLPD) within a modified Sample-Efficient Robotic Reinforcement Learning (SERL) framework, while the LL policy utilizes Soft Actor-Critic (SAC) combined with Hindsight Experience Replay (HER). Evaluated in the high-fidelity HoloOcean simulator, our method demonstrates successful obstacle avoidance, achieving trajectory lengths closely approximating (within 4% to 6% of) an $\text{RRT}^*$ planning baseline. Furthermore, the learned policy exhibits strong robustness to simulated sensor noise and decreased visibility. While the system navigates familiar geometries effectively, experiments reveal generalization limitations when encountering unvisited areas with novel obstacle shapes. Ultimately, this work demonstrates the promise of sample-efficient, end-to-end DRL for underwater navigation using minimal computational hardware.
Abstract（参考訳）: 自律型水中車両(AUV)は伝統的に、知覚、経路計画、移動制御のための複雑で複雑なパイプラインに依存している。本稿では,生センサデータをスラスタコマンドに直接マッピングし,手動によるエンジニアリングを削減できる,エンドツーエンドのDeep Reinforcement Learning (DRL)アプローチの実現可能性について検討する。本稿では,問題を2つのマルコフ決定プロセスに分割する階層的強化学習(HRL)アーキテクチャを提案する。 2Hzで動作するHigh-Level(HL)ポリシーは、84ポンドの単眼カメラフレームの84ドル、100ドル分の100ドル分の前方画像ソナーの積み重ね、空間的なサブゴールを生成するためのプロプリセプティブデータを処理する。同時に、10Hzで動作する低レベル(LL)ポリシーは、これらのサブゴールをスラスタコマンドに変換する。 HLポリシは、修正されたサンプル効率のロボット強化学習(SERL)フレームワークで、RLPD(Reinforcement Learning from Prior Demonstrations)を使用してトレーニングされ、LLポリシは、HER(Hindsight Experience Replay)と組み合わせたSoft Actor-Critic(SAC)を使用している。高忠実度ホロオセアンシミュレータで評価し, 軌道長の密接な近似(4%から6%)を行い, 計画ベースラインを$\text{RRT}^*$とした。さらに、学習方針は、センサノイズのシミュレートと可視性低下に対して強い堅牢性を示す。このシステムは、よく知られた地形を効果的にナビゲートする一方で、新しい障害物形状の未観測領域に遭遇する際の一般化の限界を明らかにする。最終的に、この研究は、最小限の計算ハードウェアを使用した水中航法におけるサンプリング効率、エンドツーエンドDRLの可能性を実証している。

論文の概要: Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

関連論文リスト