Fugu-MT 論文翻訳(概要): Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical Systems

論文の概要: Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical Systems

arxiv url: http://arxiv.org/abs/2604.22244v1
Date: Fri, 24 Apr 2026 05:39:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-27 15:36:26.356583
Title: Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical Systems
Title（参考訳）: ブラックボックスハイブリッド力学系におけるハードアフィン制約を満たすための学習制御法
Authors: Aayushi Shrivastava, Kartik Nagpal, Sairam Jinkala, Jean-Baptiste Bouvier, Negar Mehr,
Abstract要約: ブラックボックスハイブリッド力学系に対する閉ループにおけるアフィン状態制約を確実に満たすポリシーを学習する。我々の重要な洞察は、システムの未知の非線形力学の制約境界付近でRLポリシーを順応し、反発するように強制することである。クローズドループにおける安全制約を満たす十分な条件を導出する。
参考スコア（独自算出の注目度）: 5.0292714462286545
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Ensuring safety for black-box hybrid dynamical systems presents significant challenges due to their instantaneous state jumps and unknown explicit nonlinear dynamics. Existing solutions for strict safety constraint satisfaction, like control barrier functions (CBFs) and reachability analysis, rely on direct knowledge of the dynamics. Similarly, safe reinforcement learning (RL) approaches often rely on known system dynamics or merely discourage safety violations through reward shaping. In this work, we want to learn RL policies which provably satisfy affine state constraints in closed loop for black-box hybrid dynamical systems with affine reset maps. Our key insight is forcing the RL policy to be affine and repulsive near the constraint boundaries for the unknown nonlinear dynamics of the system, providing guarantees that the trajectories will not violate the constraint. We further account for constraint violation due to instantaneous state jumps that occur due to impacts or reset maps in the hybrid system by introducing a second repulsive affine region before the reset that prevents post-reset states from violating the constraint. We derive sufficient conditions under which these policies satisfy safety constraints in closed loop. We also compare our approach with state-of-the-art reward shaping and learned-CBF methods on hybrid dynamical systems like the constrained pendulum and paddle juggler environments. In both scenarios, we show that our methodology learns higher quality policies while always satisfying the safety constraints.
Abstract（参考訳）: ブラックボックスハイブリッド力学系の安全性を確保することは、それらの瞬時状態のジャンプと未知の明示的非線形力学による重要な課題を示す。制御障壁関数(CBF)や到達可能性解析のような厳格な安全制約満足度のための既存のソリューションは、力学の直接的な知識に依存している。同様に、安全な強化学習(RL)アプローチは、既知のシステムのダイナミクスに依存する場合が多い。本研究では、アフィンリセット写像を持つブラックボックスハイブリッド力学系に対する閉ループにおけるアフィン状態制約を確実に満たすRLポリシーを学習したい。我々の重要な洞察は、RLポリシーをシステムの未知の非線形ダイナミクスの制約境界付近でアフィンで反発させ、トラジェクトリが制約に違反しないことを保証することである。さらに、リセット後の状態が制約に反するのを防ぐリセット前に、第2の反発性アフィン領域を導入することで、ハイブリッドシステムにおける影響やリセットマップによる即時的な状態ジャンプによる制約違反についても説明する。クローズドループにおける安全制約を満たす十分な条件を導出する。我々はまた、制約された振り子やパドルジャグラー環境のようなハイブリッド力学系における最先端の報酬形成と学習-CBF法との比較を行った。どちらのシナリオにおいても,安全上の制約を常に満たしながら,我々の方法論が高品質なポリシーを学習していることが示される。

論文の概要: Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical Systems

関連論文リスト