Fugu-MT 論文翻訳(概要): Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

論文の概要: Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

arxiv url: http://arxiv.org/abs/2109.02791v1
Date: Tue, 7 Sep 2021 00:51:12 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-08 14:23:23.915802
Title: Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions
Title（参考訳）: ガウス過程と制御バリア関数による時間論理を用いた安全臨界モジュール深層強化学習
Authors: Mingyu Cai, Cristian-Ioan Vasile
Abstract要約: 強化学習(Reinforcement Learning, RL)は,現実のアプリケーションに対して限られた成功を収める,有望なアプローチである。本稿では,複数の側面からなる学習型制御フレームワークを提案する。 ECBFをベースとしたモジュラーディープRLアルゴリズムは,ほぼ完全な成功率を達成し,高い確率で安全性を保護することを示す。
参考スコア（独自算出の注目度）: 3.5897534810405403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications, because ensuring safe exploration or facilitating adequate exploitation is a challenges for controlling robotic systems with unknown models and measurement uncertainties. Such a learning problem becomes even more intractable for complex tasks over continuous space (state-space and action-space). In this paper, we propose a learning-based control framework consisting of several aspects: (1) linear temporal logic (LTL) is leveraged to facilitate complex tasks over an infinite horizons which can be translated to a novel automaton structure; (2) we propose an innovative reward scheme for RL-agent with the formal guarantee such that global optimal policies maximize the probability of satisfying the LTL specifications; (3) based on a reward shaping technique, we develop a modular policy-gradient architecture utilizing the benefits of automaton structures to decompose overall tasks and facilitate the performance of learned controllers; (4) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safeguard using Exponential Control Barrier Functions (ECBFs) to address problems with high-order relative degrees. In addition, we utilize the properties of LTL automatons and ECBFs to construct a guiding process to further improve the efficiency of exploration. Finally, we demonstrate the effectiveness of the framework via several robotic environments. And we show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability confidence during training.
Abstract（参考訳）: 強化学習(rl)は、未知のモデルや測定の不確実性を持つロボットシステムを制御する上で、安全な探索や適切な活用を促進することが課題となるため、実世界のアプリケーションでの成功を限定した有望なアプローチである。このような学習問題は、連続空間(状態空間とアクション空間)上の複雑なタスクにとってさらに難解になる。 In this paper, we propose a learning-based control framework consisting of several aspects: (1) linear temporal logic (LTL) is leveraged to facilitate complex tasks over an infinite horizons which can be translated to a novel automaton structure; (2) we propose an innovative reward scheme for RL-agent with the formal guarantee such that global optimal policies maximize the probability of satisfying the LTL specifications; (3) based on a reward shaping technique, we develop a modular policy-gradient architecture utilizing the benefits of automaton structures to decompose overall tasks and facilitate the performance of learned controllers; (4) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safeguard using Exponential Control Barrier Functions (ECBFs) to address problems with high-order relative degrees. さらに,LTLオートマトンとECBFの特性を利用して,探索の効率化を図るための指針プロセスを構築した。最後に,いくつかのロボット環境を通して,フレームワークの有効性を示す。また,このようなecbfに基づくモジュール型深層rlアルゴリズムは,訓練中に高い確率信頼度で,ほぼ完璧に近い成功率とガード安全性を実現する。

関連論文リスト

Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
ロボット工学では、現代の戦略は学習に基づくもので、複雑なブラックボックスの性質と解釈可能性の欠如が特徴である。本稿では, 深部強化学習(DRL)に基づく衝突のない軌道プランナと, 自動調整型低レベル制御戦略を統合することを提案する。
論文参考訳（メタデータ） (2024-02-04T15:54:03Z)
Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks [70.76757529955577]
本稿では、国家安全RLの観点から、この領域における先行研究を再考する。安全最適化と安全予測を組み合わせた共同手法であるUnrolling Safety Layer (USL)を提案する。この領域のさらなる研究を容易にするため、我々は関連するアルゴリズムを統一パイプラインで再現し、SafeRL-Kitに組み込む。
論文参考訳（メタデータ） (2022-12-12T06:30:17Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
CBFをベースとした安全クリティカルコントローラのモデル不確実性を考慮した再構成を提案する。次に、結果の安全制御器のポイントワイズ実現可能性条件を示す。これらの条件を利用して、イベントトリガーによるオンラインデータ収集戦略を考案する。
論文参考訳（メタデータ） (2022-08-23T05:02:09Z)
Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions [35.9713619595494]
強化学習と連続非線形制御は、複雑なシーケンシャルな意思決定タスクの複数の領域にうまく展開されている。学習過程の探索特性とモデル不確実性の存在を考えると、それらを安全クリティカルな制御タスクに適用することは困難である。本稿では,オンライン制御タスクを対象とした,効率のよいエピソード型安全な学習フレームワークを提案する。
論文参考訳（メタデータ） (2022-07-29T00:54:35Z)
Constrained Reinforcement Learning for Robotics via Scenario-Based Programming [64.07167316957533]
DRLをベースとしたエージェントの性能を最適化し,その動作を保証することが重要である。本稿では,ドメイン知識を制約付きDRLトレーニングループに組み込む新しい手法を提案する。我々の実験は、専門家の知識を活用するために我々のアプローチを用いることで、エージェントの安全性と性能が劇的に向上することを示した。
論文参考訳（メタデータ） (2022-06-20T07:19:38Z)
Safe RAN control: A Symbolic Reinforcement Learning Approach [62.997667081978825]
本稿では,無線アクセスネットワーク(RAN)アプリケーションの安全管理のためのシンボル強化学習(SRL)アーキテクチャを提案する。我々は、ユーザが所定のセルネットワークトポロジに対して高レベルの論理的安全性仕様を指定できる純粋に自動化された手順を提供する。ユーザがシステムに意図仕様を設定するのを支援するために開発されたユーザインターフェース(UI)を導入し、提案するエージェントの動作の違いを検査する。
論文参考訳（メタデータ） (2021-06-03T16:45:40Z)
Learning Off-Policy with Online Planning [18.63424441772675]
本研究では,学習モデルと端末値関数を用いたHステップルックアヘッドの新たなインスタンス化について検討する。ナビゲーション環境の集合に配置する際の安全性制約を組み込むLOOPの柔軟性を示す。
論文参考訳（メタデータ） (2020-08-23T16:18:44Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
我々は,制約付きポリシー最適化(CPPO)の実装に基づくRLフレームワークであるGCPOを紹介する。誘導制約付きRLは所望の最適値に近い高速収束を実現し,正確な報酬関数チューニングを必要とせず,最適かつ物理的に実現可能なロボット制御動作を実現することを示す。
論文参考訳（メタデータ） (2020-02-22T10:15:53Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
線形時間論理(LTL)を用いて未知の連続状態/動作マルコフ決定過程(MDP)のゴールを定式化できるモデルフリーなRLアルゴリズムを提案する。このアルゴリズムは、トレースが仕様を最大確率で満たす制御ポリシーを合成することが保証される。
論文参考訳（メタデータ） (2019-02-02T20:09:32Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。