Fugu-MT 論文翻訳(概要): Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability

論文の概要: Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability

arxiv url: http://arxiv.org/abs/2605.14246v1
Date: Thu, 14 May 2026 01:23:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.565746
Title: Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability
Title（参考訳）: 部分観測可能性下における安全臨界制御のための行動規定型リスクゲーティング
Authors: Yushen Liu, Yin-Jen Chen, Ziyi Chen, Tao Wang, Heng Huang, Xugui Zhou, Yanfu Zhang,
Abstract要約: 部分観測可能性下でのリスク感応制御のための軽量なリスクゲート強化学習近似を提案する。安全クリティカルな部分観測可能な2つの領域 – 自動グルコース調節と安全制約ナビゲーション – でアプローチを評価した。
参考スコア（独自算出の注目度）: 79.08785366532287
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many safety-critical control problems are modeled as risk-sensitive partially observable Markov decision processes, where the controller must make decisions from incomplete observations while balancing task performance against safety risk. Although belief-space planning provides a principled solution, maintaining and planning over beliefs can be computationally costly and sensitive to model specification in practical domains. We propose a lightweight risk-gated reinforcement learning approximation for risk-sensitive control under partial observability. The method constructs a compact finite-history proxy state and learns an action-conditioned predictor of near-term safety violation. This predicted candidate-action risk is used in two complementary ways: as a risk penalty during value learning, and as a decision-time gate that interpolates between optimistic and conservative ensemble value estimates. As a result, low-risk actions are evaluated closer to reward-seeking estimates, while high-risk actions are evaluated more conservatively. We evaluate the approach in two safety-critical partially observable domains: automated glucose regulation and safety-constrained navigation. Across adult and adolescent glucose-control cohorts, the method improves overall glycemic tradeoffs and substantially reduces runtime relative to a belief-space planning baseline. On Safety-Gym navigation benchmarks, it achieves a more favorable reward-cost balance than unconstrained RL and several standard safe-RL baselines. These results suggest that action-conditioned near-term risk can provide an effective local signal for approximate risk-sensitive POMDP control when full belief-space planning is impractical.
Abstract（参考訳）: 多くの安全クリティカルな制御問題は、リスクに敏感な部分的に観察可能なマルコフ決定プロセスとしてモデル化されている。信念空間の計画は原則化された解決策を提供するが、信念の維持と計画は、実用的な領域におけるモデル仕様に対して計算的かつ敏感に行うことができる。部分観測可能性下でのリスク感応制御のための軽量なリスクゲート強化学習近似を提案する。この方法は、コンパクトな有限履歴プロキシ状態を構築し、短期的安全違反のアクション条件付き予測器を学習する。この予測されたリスクは、価値学習中のリスクペナルティとして、楽観的なアンサンブルと保守的なアンサンブルの見積を補う決定時間ゲートとして、2つの補完的な方法で使用される。その結果、リスクの高い行動は報酬を求める推定に近づき、リスクの高い行動はより保守的に評価される。安全クリティカルな部分観測可能な2つの領域 – 自動グルコース調節と安全制約ナビゲーション – でアプローチを評価した。成人と思春期のグルコースコントロールコホート全体で、この方法は全糖質のトレードオフを改善し、信念空間の計画基準に対する実行時間を大幅に減少させる。 Safety-Gymナビゲーションベンチマークでは、制約のないRLやいくつかの標準安全RLベースラインよりも好意的な報酬コストバランスを実現している。これらの結果から, 行動条件付き短期リスクは, 完全な信念空間計画が現実的でない場合に, リスクに敏感なPOMDP制御に有効な局所信号を与える可能性が示唆された。

論文の概要: Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability

関連論文リスト