Robustness Verification of Deep Reinforcement Learning Based Control
Systems using Reward Martingales
- URL: http://arxiv.org/abs/2312.09695v1
- Date: Fri, 15 Dec 2023 11:16:47 GMT
- Title: Robustness Verification of Deep Reinforcement Learning Based Control
Systems using Reward Martingales
- Authors: Dapeng Zhi, Peixin Wang, Cheng Chen, Min Zhang
- Abstract summary: We present the first approach for robustness verification of DRL-based control systems by introducing reward martingales.
Our results provide provably quantitative certificates for the two questions.
We then show that reward martingales can be implemented and trained via neural networks, against different types of control policies.
- Score: 13.069196356472272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (DRL) has gained prominence as an effective
approach for control systems. However, its practical deployment is impeded by
state perturbations that can severely impact system performance. Addressing
this critical challenge requires robustness verification about system
performance, which involves tackling two quantitative questions: (i) how to
establish guaranteed bounds for expected cumulative rewards, and (ii) how to
determine tail bounds for cumulative rewards. In this work, we present the
first approach for robustness verification of DRL-based control systems by
introducing reward martingales, which offer a rigorous mathematical foundation
to characterize the impact of state perturbations on system performance in
terms of cumulative rewards. Our verified results provide provably quantitative
certificates for the two questions. We then show that reward martingales can be
implemented and trained via neural networks, against different types of control
policies. Experimental results demonstrate that our certified bounds tightly
enclose simulation outcomes on various DRL-based control systems, indicating
the effectiveness and generality of the proposed approach.
Related papers
- Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Decentralized Event-Triggered Online Learning for Safe Consensus of
Multi-Agent Systems with Gaussian Process Regression [3.405252606286664]
This paper presents a novel learning-based distributed control law, augmented by an auxiliary dynamics.
For continuous enhancement in predictive performance, a data-efficient online learning strategy with a decentralized event-triggered mechanism is proposed.
To demonstrate the efficacy of the proposed learning-based controller, a comparative analysis is conducted, contrasting it with both conventional distributed control laws and offline learning methodologies.
arXiv Detail & Related papers (2024-02-05T16:41:17Z) - Reliability Quantification of Deep Reinforcement Learning-based Control [0.0]
This study proposes a method for quantifying the reliability of DRL-based control.
The reliability is quantified using two neural networks: reference and evaluator.
The proposed method was applied to the problem of switching trained models depending on the state.
arXiv Detail & Related papers (2023-09-29T04:49:49Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - CROP: Certifying Robust Policies for Reinforcement Learning through
Functional Smoothing [41.093241772796475]
We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations.
We propose two types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards.
arXiv Detail & Related papers (2021-06-17T07:58:32Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - The Impact of Data on the Stability of Learning-Based Control- Extended
Version [63.97366815968177]
We propose a Lyapunov-based measure for quantifying the impact of data on the certifiable control performance.
By modeling unknown system dynamics through Gaussian processes, we can determine the interrelation between model uncertainty and satisfaction of stability conditions.
arXiv Detail & Related papers (2020-11-20T19:10:01Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.