Fugu-MT 論文翻訳(概要): Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model

論文の概要: Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model

arxiv url: http://arxiv.org/abs/2508.09971v2
Date: Tue, 30 Sep 2025 20:19:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-02 14:33:21.695605
Title: Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model
Title（参考訳）: セマンティックダイナミクスモデルを用いた安全強化学習によるUAVの視覚駆動型河川追従
Authors: Zihan Wang, Nina Mahmoudian,
Abstract要約: 無人航空機による視覚駆動の自律川は、救助、監視、環境監視といった用途に欠かせない。本稿では,報酬優位関数を改良したMarginal Gain Advantage Estimationを紹介する。次に, セマンティック・ダイナミクス・モデルを構築し, セマンティック・ダイナミクス・モデルを構築した。第3に、コスト優位性評価のためにアクター、コスト推定器、SDMを統合するConstrained Actor Dynamics Estimatorアーキテクチャを提案する。
参考スコア（独自算出の注目度）: 11.28895057233897
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision-driven autonomous river following by Unmanned Aerial Vehicles is critical for applications such as rescue, surveillance, and environmental monitoring, particularly in dense riverine environments where GPS signals are unreliable. These safety-critical navigation tasks must satisfy hard safety constraints while optimizing performance. Moreover, the reward in river following is inherently history-dependent (non-Markovian) by which river segment has already been visited, making it challenging for standard safe Reinforcement Learning (SafeRL). To address these gaps, we propose three contributions. First, we introduce Marginal Gain Advantage Estimation, which refines the reward advantage function by using a sliding window baseline computed from historical episodic returns, aligning the advantage estimate with non-Markovian dynamics. Second, we develop a Semantic Dynamics Model based on patchified water semantic masks offering more interpretable and data-efficient short-term prediction of future observations compared to latent vision dynamics models. Third, we present the Constrained Actor Dynamics Estimator architecture, which integrates the actor, cost estimator, and SDM for cost advantage estimation to form a model-based SafeRL framework. Simulation results demonstrate that MGAE achieves faster convergence and superior performance over traditional critic-based methods like Generalized Advantage Estimation. SDM provides more accurate short-term state predictions that enable the cost estimator to better predict potential violations. Overall, CADE effectively integrates safety regulation into model-based RL, with the Lagrangian approach providing a "soft" balance between reward and safety during training, while the safety layer enhances inference by imposing a "hard" action overlay.
Abstract（参考訳）: 無人航空機による視覚駆動の自律川は、特にGPS信号が信頼できない密集した河川環境において、救助、監視、環境監視などの用途に欠かせない。これらの安全クリティカルなナビゲーションタスクは、パフォーマンスを最適化しながら、ハードセーフな制約を満たす必要がある。さらに,河川セグメントが既に訪れている歴史に依存しない(非マルコフ的)河川を追従する報奨は,標準的な安全強化学習(SafeRL)にとって困難である。これらのギャップに対処するため、我々は3つのコントリビューションを提案する。まず,マージナルゲインアドバンテージ推定(Marginal Gain Advantage Estimation)を導入する。これは,歴史的エピソード回帰から計算したスライディングウインドウベースラインを用いて,利益推定を非マルコフ力学と整合させることにより,報奨優位関数を洗練する。第2に、潜時視覚力学モデルと比較して、より解釈可能で、データ効率のよい将来の観測の短期予測を提供する、パッチ付き水意味マスクに基づくセマンティック・ダイナミクス・モデルを開発する。第3に,モデルベースのSafeRLフレームワークを構築するために,アクタ,コスト推定器,SDMを統合するConstrained Actor Dynamics Estimatorアーキテクチャを提案する。シミュレーションの結果、MGAEは、一般化アドバンテージ推定のような従来の批判に基づく手法よりも、より高速な収束と優れた性能を実現することが示された。 SDMは、コスト推定器が潜在的な違反をより正確に予測できるように、より正確な短期状態予測を提供する。全体として、CADEはモデルベースのRLに安全規制を効果的に統合し、ラグランジアンアプローチはトレーニング中に報酬と安全の間の「ソフト」バランスを提供し、安全層は「ハード」アクションオーバーレイを付与することで推論を強化する。

論文の概要: Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model

関連論文リスト