Related papers: Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space

Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space

URL: http://arxiv.org/abs/2405.11982v1
Date: Mon, 20 May 2024 12:31:11 GMT
Title: Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space
Authors: Qianmei Liu, Yufei Kuang, Jie Wang,
Abstract summary: We propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training. The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance. The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments.
Score: 3.639580365066386
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning (DRL) algorithms can suffer from modeling errors between the simulation and the real world. Many studies use adversarial learning to generate perturbation during training process to model the discrepancy and improve the robustness of DRL. However, most of these approaches use a fixed parameter to control the intensity of the adversarial perturbation, which can lead to a trade-off between average performance and robustness. In fact, finding the optimal parameter of the perturbation is challenging, as excessive perturbations may destabilize training and compromise agent performance, while insufficient perturbations may not impart enough information to enhance robustness. To keep the training stable while improving robustness, we propose a simple but effective method, namely, Adaptive Adversarial Perturbation (A2P), which can dynamically select appropriate adversarial perturbations for each sample. Specifically, we propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training. By designing a metric for the current intensity of the perturbation, our method can calculate the suitable perturbation levels based on the current relative performance. The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance. The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments. The code is available at https://github.com/Lqm00/A2P-SAC.

Related papers

Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach [0.9549646359252346]
We propose dynamic Learning Rate for deep Reinforcement Learning (LRRL) LRRL is a meta-learning approach that selects the learning rate based on the agent's performance during training. Our empirical results demonstrate that LRRL can substantially improve the performance of deep RL algorithms.
arXiv Detail & Related papers (2024-10-16T14:15:28Z)
Adaptive Robust Learning using Latent Bernoulli Variables [50.223140145910904]
We present an adaptive approach for learning from corrupted training sets. We identify corrupted non-corrupted samples with latent Bernoulli variables. The resulting problem is solved via variational inference.
arXiv Detail & Related papers (2023-12-01T13:50:15Z)
Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law. We approach both objectives by using reinforcement learning to compute the optimal control law. Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z)
Robust Deep Reinforcement Learning Scheduling via Weight Anchoring [7.570246812206769]
We use weight anchoring to cultivate and fixate desired behavior in Neural Networks. Weight anchoring may be used to find a solution to a learning problem that is nearby the solution of another learning problem. Results show that this method provides performance comparable to the state of the art of augmenting a simulation environment.
arXiv Detail & Related papers (2023-04-20T09:30:23Z)
Improve Noise Tolerance of Robust Loss via Noise-Awareness [60.34670515595074]
We propose a meta-learning method which is capable of adaptively learning a hyper parameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity) Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.
arXiv Detail & Related papers (2023-01-18T04:54:58Z)
Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC) Our algorithm alleviates problems with local minima through a smooth critic function. We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z)
Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model [42.28001762749647]
In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. We consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments. Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties.
arXiv Detail & Related papers (2022-03-13T06:37:25Z)
Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$ Adaptive Control [7.025818894763949]
A reinforcement learning (RL) policy could fail in a new/perturbed environment due to the existence of dynamic variations. We propose an approach to robustifying a pre-trained non-robust RL policy with $mathcalL_1$ adaptive control. Our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world.
arXiv Detail & Related papers (2021-06-04T04:28:46Z)
Improving Model Robustness by Adaptively Correcting Perturbation Levels with Active Queries [43.98198697182858]
A novel active learning framework is proposed to allow the model to interactively query the correct perturbation level from human experts. Both theoretical analysis and experimental studies validate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2021-03-27T07:09:01Z)
Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples. We propose a new framework called SPROUT, self-progressing robust training. Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
Attribute-Guided Adversarial Training for Robustness to Natural Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.