Fugu-MT 論文翻訳(概要): Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model

論文の概要: Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model

arxiv url: http://arxiv.org/abs/2508.09971v1
Date: Wed, 13 Aug 2025 17:39:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-14 20:42:00.981058
Title: Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model
Title（参考訳）: セマンティックダイナミクスモデルを用いた安全強化学習によるUAVの視覚駆動型河川追従
Authors: Zihan Wang, Nina Mahmoudian,
Abstract要約: 無人航空機による視覚駆動の自律川は、救助、監視、環境監視といった用途に欠かせない。報奨関数がサブモジュラーであり、よりユニークな河川セグメントが訪れるとリターンが低下するカバレッジ制御問題として、河川追従を定式化する。本稿では,モデルベースのSafeRLフレームワークを構築するために,アクタ,コスト推定器,SDMを統合するConstrained Actor Dynamics Estimatorアーキテクチャを提案する。
参考スコア（独自算出の注目度）: 11.29011178752037
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision-driven autonomous river following by Unmanned Aerial Vehicles is critical for applications such as rescue, surveillance, and environmental monitoring, particularly in dense riverine environments where GPS signals are unreliable. We formalize river following as a coverage control problem in which the reward function is submodular, yielding diminishing returns as more unique river segments are visited, thereby framing the task as a Submodular Markov Decision Process. First, we introduce Marginal Gain Advantage Estimation, which refines the reward advantage function by using a sliding window baseline computed from historical episodic returns, thus aligning the advantage estimation with the agent's evolving recognition of action value in non-Markovian settings. Second, we develop a Semantic Dynamics Model based on patchified water semantic masks that provides more interpretable and data-efficient short-term prediction of future observations compared to latent vision dynamics models. Third, we present the Constrained Actor Dynamics Estimator architecture, which integrates the actor, the cost estimator, and SDM for cost advantage estimation to form a model-based SafeRL framework capable of solving partially observable Constrained Submodular Markov Decision Processes. Simulation results demonstrate that MGAE achieves faster convergence and superior performance over traditional critic-based methods like Generalized Advantage Estimation. SDM provides more accurate short-term state predictions that enable the cost estimator to better predict potential violations. Overall, CADE effectively integrates safety regulation into model-based RL, with the Lagrangian approach achieving the soft balance of reward and safety during training, while the safety layer enhances performance during inference by hard action overlay.
Abstract（参考訳）: 無人航空機による視覚駆動の自律川は、特にGPS信号が信頼できない密集した河川環境において、救助、監視、環境監視などの用途に欠かせない。我々は,報奨関数がサブモジュラーであり,よりユニークな河川セグメントが訪れるとリターンが減少し,サブモジュラーマルコフ決定プロセスとしてタスクをフレーミングする,カバレッジ制御問題として,河川追従を定式化する。まず,ヒストリカル・エピソード・リターンから計算したスライディング・ウインドウ・ベースラインを用いて,報酬優位性関数を洗練し,非マルコフ的セッティングにおけるエージェントの行動値の進化的認識と整合性を持たせる。第2に,潜時視覚力学モデルと比較して,より解釈可能かつデータ効率のよい将来の観測の短期的予測が可能なセマンティック・ダイナミクス・モデルを開発した。第3に、アクター、コスト推定器、SDMを統合した制約されたアクターダイナミクス推定器アーキテクチャを提案し、部分的に可観測性のあるサブモジュラーマルコフ決定過程を解くことができるモデルベースのSafeRLフレームワークを構築した。シミュレーションの結果、MGAEは、一般化アドバンテージ推定のような従来の批判に基づく手法よりも早く収束し、優れた性能を発揮することが示された。 SDMは、コスト推定器が潜在的な違反をより正確に予測できるように、より正確な短期状態予測を提供する。全体として、CADEはモデルベースのRLに、トレーニング中の報酬と安全のソフトバランスを達成するためのラグランジアンアプローチと、ハードアクションオーバーレイによる推論時のパフォーマンスを高めるための安全規制を効果的に統合する。

論文の概要: Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model

関連論文リスト