Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for
Visual Reinforcement Learning
- URL: http://arxiv.org/abs/2202.09982v2
- Date: Tue, 22 Feb 2022 15:04:35 GMT
- Title: Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for
Visual Reinforcement Learning
- Authors: Zhecheng Yuan, Guozheng Ma, Yao Mu, Bo Xia, Bo Yuan, Xueqian Wang,
Ping Luo, Huazhe Xu
- Abstract summary: We propose Task-aware Lipschitz Data Augmentation (TLDA) for visual Reinforcement Learning (RL)
TLDA explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels.
It outperforms previous state-of-the-art methods across the 3 different visual control benchmarks.
- Score: 27.205521177841568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key challenges in visual Reinforcement Learning (RL) is to learn
policies that can generalize to unseen environments. Recently, data
augmentation techniques aiming at enhancing data diversity have demonstrated
proven performance in improving the generalization ability of learned policies.
However, due to the sensitivity of RL training, naively applying data
augmentation, which transforms each pixel in a task-agnostic manner, may suffer
from instability and damage the sample efficiency, thus further exacerbating
the generalization performance. At the heart of this phenomenon is the diverged
action distribution and high-variance value estimation in the face of augmented
images. To alleviate this issue, we propose Task-aware Lipschitz Data
Augmentation (TLDA) for visual RL, which explicitly identifies the
task-correlated pixels with large Lipschitz constants, and only augments the
task-irrelevant pixels. To verify the effectiveness of TLDA, we conduct
extensive experiments on DeepMind Control suite, CARLA and DeepMind
Manipulation tasks, showing that TLDA improves both sample efficiency in
training time and generalization in test time. It outperforms previous
state-of-the-art methods across the 3 different visual control benchmarks.
Related papers
- A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning [12.889687274108248]
A Q-learning algorithm is prone to overfitting and training instabilities when trained from visual observations.
We propose a generalized recipe, SADA, that works with wider varieties of augmentations.
We find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations.
arXiv Detail & Related papers (2024-05-27T17:58:23Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Multi-Augmentation for Efficient Visual Representation Learning for
Self-supervised Pre-training [1.3733988835863333]
We propose Multi-Augmentations for Self-Supervised Learning (MA-SSRL), which fully searched for various augmentation policies to build the entire pipeline.
MA-SSRL successfully learns the invariant feature representation and presents an efficient, effective, and adaptable data augmentation pipeline for self-supervised pre-training.
arXiv Detail & Related papers (2022-05-24T04:18:39Z) - Learning Task-relevant Representations for Generalization via
Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning.
We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information.
Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under
Data Augmentation [25.493902939111265]
We investigate causes of instability when using data augmentation in off-policy Reinforcement Learning algorithms.
We propose a simple yet effective technique for stabilizing this class of algorithms under augmentation.
Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL.
arXiv Detail & Related papers (2021-07-01T17:58:05Z) - Learning Representational Invariances for Data-Efficient Action
Recognition [52.23716087656834]
We show that our data augmentation strategy leads to promising performance on the Kinetics-100, UCF-101, and HMDB-51 datasets.
We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
arXiv Detail & Related papers (2021-03-30T17:59:49Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - MetaAugment: Sample-Aware Data Augmentation Policy Learning [20.988767360529362]
We learn a sample-aware data augmentation policy efficiently by formulating it as a sample reweighting problem.
An augmentation policy network takes a transformation and the corresponding augmented image as inputs, and outputs a weight to adjust the augmented image loss computed by a task network.
At training stage, the task network minimizes the weighted losses of augmented training images, while the policy network minimizes the loss of the task network on a validation set via meta-learning.
arXiv Detail & Related papers (2020-12-22T15:19:27Z) - Generalization in Reinforcement Learning by Soft Data Augmentation [11.752595047069505]
SOft Data Augmentation (SODA) is a method that decouples augmentation from policy learning.
We find SODA to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.
arXiv Detail & Related papers (2020-11-26T17:00:34Z) - Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.