Related papers: Robustness Verification of Deep Reinforcement Learning Based Control Systems using Reward Martingales

Robustness Verification of Deep Reinforcement Learning Based Control Systems using Reward Martingales

URL: http://arxiv.org/abs/2312.09695v1
Date: Fri, 15 Dec 2023 11:16:47 GMT
Title: Robustness Verification of Deep Reinforcement Learning Based Control Systems using Reward Martingales
Authors: Dapeng Zhi, Peixin Wang, Cheng Chen, Min Zhang
Abstract summary: We present the first approach for robustness verification of DRL-based control systems by introducing reward martingales. Our results provide provably quantitative certificates for the two questions. We then show that reward martingales can be implemented and trained via neural networks, against different types of control policies.
Score: 13.069196356472272
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Reinforcement Learning (DRL) has gained prominence as an effective approach for control systems. However, its practical deployment is impeded by state perturbations that can severely impact system performance. Addressing this critical challenge requires robustness verification about system performance, which involves tackling two quantitative questions: (i) how to establish guaranteed bounds for expected cumulative rewards, and (ii) how to determine tail bounds for cumulative rewards. In this work, we present the first approach for robustness verification of DRL-based control systems by introducing reward martingales, which offer a rigorous mathematical foundation to characterize the impact of state perturbations on system performance in terms of cumulative rewards. Our verified results provide provably quantitative certificates for the two questions. We then show that reward martingales can be implemented and trained via neural networks, against different types of control policies. Experimental results demonstrate that our certified bounds tightly enclose simulation outcomes on various DRL-based control systems, indicating the effectiveness and generality of the proposed approach.

Related papers

Two-stage Risk Control with Application to Ranked Retrieval [1.8481458455172357]
We develop two-stage risk control methods based on the proposed learn-then-test (LTT) and conformal risk control (CRC) frameworks. We provide theoretical guarantees for our proposed methods and design novel loss functions tailored for ranked retrieval tasks. The effectiveness of our approach is validated through experiments on two large-scale, widely-used datasets.
arXiv Detail & Related papers (2024-04-27T03:37:12Z)
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z)
Decentralized Event-Triggered Online Learning for Safe Consensus of Multi-Agent Systems with Gaussian Process Regression [3.405252606286664]
This paper presents a novel learning-based distributed control law, augmented by an auxiliary dynamics. For continuous enhancement in predictive performance, a data-efficient online learning strategy with a decentralized event-triggered mechanism is proposed. To demonstrate the efficacy of the proposed learning-based controller, a comparative analysis is conducted, contrasting it with both conventional distributed control laws and offline learning methodologies.
arXiv Detail & Related papers (2024-02-05T16:41:17Z)
Reliability Quantification of Deep Reinforcement Learning-based Control [0.0]
This study proposes a method for quantifying the reliability of DRL-based control. The reliability is quantified using two neural networks: reference and evaluator. The proposed method was applied to the problem of switching trained models depending on the state.
arXiv Detail & Related papers (2023-09-29T04:49:49Z)
Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z)
Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case. We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z)
CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing [41.093241772796475]
We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations. We propose two types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards.
arXiv Detail & Related papers (2021-06-17T07:58:32Z)
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z)
The Impact of Data on the Stability of Learning-Based Control- Extended Version [63.97366815968177]
We propose a Lyapunov-based measure for quantifying the impact of data on the certifiable control performance. By modeling unknown system dynamics through Gaussian processes, we can determine the interrelation between model uncertainty and satisfaction of stability conditions.
arXiv Detail & Related papers (2020-11-20T19:10:01Z)
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises. Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions. We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.