Adaptive Adversarial Training for Meta Reinforcement Learning
- URL: http://arxiv.org/abs/2104.13302v1
- Date: Tue, 27 Apr 2021 16:23:34 GMT
- Title: Adaptive Adversarial Training for Meta Reinforcement Learning
- Authors: Shiqi Chen, Zhengyu Chen, Donglin Wang
- Abstract summary: We build upon model-agnostic meta-learning (MAML) and propose a novel method to generate adversarial samples for MRL by using Generative Adversarial Network (GAN)
That allows us to enhance the robustness of MRL to adversal attacks by leveraging these attacks during meta training process.
- Score: 6.576665763018747
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Meta Reinforcement Learning (MRL) enables an agent to learn from a limited
number of past trajectories and extrapolate to a new task. In this paper, we
attempt to improve the robustness of MRL. We build upon model-agnostic
meta-learning (MAML) and propose a novel method to generate adversarial samples
for MRL by using Generative Adversarial Network (GAN). That allows us to
enhance the robustness of MRL to adversal attacks by leveraging these attacks
during meta training process.
Related papers
- MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning [18.82398325614491]
We propose a new model-based approach to meta-RL, based on elements from existing state-of-the-art model-based and meta-RL methods.
We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency.
In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.
arXiv Detail & Related papers (2024-03-14T20:40:36Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation
and Complexity Analysis [20.11993437283895]
This paper provides a game-theoretical underpinning for understanding this type of security risk.
We define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation.
We observe that a minor effort of the attacker can significantly deteriorate the learning performance.
arXiv Detail & Related papers (2022-07-29T21:29:29Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Mis-spoke or mis-lead: Achieving Robustness in Multi-Agent Communicative
Reinforcement Learning [37.24674549469648]
We make the first step towards conducting message attacks on MACRL methods.
We develop a defence method via message reconstruction.
We consider the ability of the malicious agent to adapt to the changing and improving defensive communicative policies.
arXiv Detail & Related papers (2021-08-09T04:41:47Z) - On Fast Adversarial Robustness Adaptation in Model-Agnostic
Meta-Learning [100.14809391594109]
Model-agnostic meta-learning (MAML) has emerged as one of the most successful meta-learning techniques in few-shot learning.
Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning.
We propose a general but easily-optimized robustness-regularized meta-learning framework, which allows the use of unlabeled data augmentation, fast adversarial attack generation, and computationally-light fine-tuning.
arXiv Detail & Related papers (2021-02-20T22:03:04Z) - Performance-Weighed Policy Sampling for Meta-Reinforcement Learning [1.77898701462905]
Enhanced Model-Agnostic Meta-Learning (E-MAML) generates fast convergence of the policy function from a small number of training examples.
E-MAML maintains a set of policy parameters learned in the environment for previous tasks.
We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes.
arXiv Detail & Related papers (2020-12-10T23:08:38Z) - Offline Meta-Reinforcement Learning with Advantage Weighting [125.21298190780259]
This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting.
offline meta-RL is analogous to the widely successful supervised learning strategy of pre-training a model on a large batch of fixed, pre-collected data.
We propose Meta-Actor Critic with Advantage Weighting (MACAW), an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training.
arXiv Detail & Related papers (2020-08-13T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.