Generalization of Reinforcement Learning with Policy-Aware Adversarial
Data Augmentation
- URL: http://arxiv.org/abs/2106.15587v1
- Date: Tue, 29 Jun 2021 17:21:59 GMT
- Title: Generalization of Reinforcement Learning with Policy-Aware Adversarial
Data Augmentation
- Authors: Hanping Zhang, Yuhong Guo
- Abstract summary: We propose a novel policy-aware adversarial data augmentation method to augment the standard policy learning method with automatically generated trajectory data.
We conduct experiments on a number of RL tasks to investigate the generalization performance of the proposed method.
The results show our method can generalize well with limited training diversity, and achieve the state-of-the-art generalization test performance.
- Score: 32.70482982044965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generalization gap in reinforcement learning (RL) has been a significant
obstacle that prevents the RL agent from learning general skills and adapting
to varying environments. Increasing the generalization capacity of the RL
systems can significantly improve their performance on real-world working
environments. In this work, we propose a novel policy-aware adversarial data
augmentation method to augment the standard policy learning method with
automatically generated trajectory data. Different from the commonly used
observation transformation based data augmentations, our proposed method
adversarially generates new trajectory data based on the policy gradient
objective and aims to more effectively increase the RL agent's generalization
ability with the policy-aware data augmentation. Moreover, we further deploy a
mixup step to integrate the original and generated data to enhance the
generalization capacity while mitigating the over-deviation of the adversarial
data. We conduct experiments on a number of RL tasks to investigate the
generalization performance of the proposed method by comparing it with the
standard baselines and the state-of-the-art mixreg approach. The results show
our method can generalize well with limited training diversity, and achieve the
state-of-the-art generalization test performance.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences.
Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples.
We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z) - CROP: Towards Distributional-Shift Robust Reinforcement Learning using
Compact Reshaped Observation Processing [8.569762036154799]
Current approaches for generalization apply data augmentation techniques to increase the diversity of training data.
Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself.
We propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization.
arXiv Detail & Related papers (2023-04-26T15:19:02Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Rethinking Domain Generalization Baselines [21.841393368012977]
deep learning models can be brittle when deployed in scenarios different from those on which they were trained.
Data augmentation strategies have shown to be helpful tools to increase data variability, supporting model robustness across domains.
This issue open new scenarios for domain generalization research, highlighting the need of novel methods properly able to take advantage of the introduced data variability.
arXiv Detail & Related papers (2021-01-22T11:35:58Z) - Generalization in Reinforcement Learning by Soft Data Augmentation [11.752595047069505]
SOft Data Augmentation (SODA) is a method that decouples augmentation from policy learning.
We find SODA to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.
arXiv Detail & Related papers (2020-11-26T17:00:34Z) - Deep Active Learning with Augmentation-based Consistency Estimation [23.492616938184092]
We propose a methodology to improve generalization ability, by applying data augmentation-based techniques to an active learning scenario.
For the data augmentation-based regularization loss, we redefined cutout (co) and cutmix (cm) strategies as quantitative metrics.
We have shown that the augmentation-based regularizer can lead to improved performance on the training step of active learning.
arXiv Detail & Related papers (2020-11-05T05:22:58Z) - Improving Generalization in Reinforcement Learning with Mixture
Regularization [113.12412071717078]
We introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments.
Mixreg increases the data diversity more effectively and helps learn smoother policies.
Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin.
arXiv Detail & Related papers (2020-10-21T08:12:03Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Adversarial Augmentation Policy Search for Domain and Cross-Lingual
Generalization in Reading Comprehension [96.62963688510035]
Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation.
We present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation.
arXiv Detail & Related papers (2020-04-13T17:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.