Bootstrap State Representation using Style Transfer for Better
Generalization in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2207.07749v1
- Date: Fri, 15 Jul 2022 20:49:45 GMT
- Title: Bootstrap State Representation using Style Transfer for Better
Generalization in Deep Reinforcement Learning
- Authors: Md Masudur Rahman and Yexiang Xue
- Abstract summary: Thinker is a bootstrapping method to remove adversarial effects of confounding features from the observation in an unsupervised way.
Thinker has wide applicability among many Deep Reinforcement Learning settings.
- Score: 16.999444076456268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Reinforcement Learning (RL) agents often overfit the training
environment, leading to poor generalization performance. In this paper, we
propose Thinker, a bootstrapping method to remove adversarial effects of
confounding features from the observation in an unsupervised way, and thus, it
improves RL agents' generalization. Thinker first clusters experience
trajectories into several clusters. These trajectories are then bootstrapped by
applying a style transfer generator, which translates the trajectories from one
cluster's style to another while maintaining the content of the observations.
The bootstrapped trajectories are then used for policy learning. Thinker has
wide applicability among many RL settings. Experimental results reveal that
Thinker leads to better generalization capability in the Procgen benchmark
environments compared to base algorithms and several data augmentation
techniques.
Related papers
- RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization [23.417092819516185]
We introduce RL-ViGen: a novel Reinforcement Learning Benchmark for Visual Generalization.
RL-ViGen contains diverse tasks and a wide spectrum of generalization types, thereby facilitating the derivation of more reliable conclusions.
Our aspiration is that RL-ViGen will serve as a catalyst in the future creation of universal visual generalization RL agents.
arXiv Detail & Related papers (2023-07-15T05:45:37Z) - Supplementing Gradient-Based Reinforcement Learning with Simple
Evolutionary Ideas [4.873362301533824]
We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL)
The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space.
arXiv Detail & Related papers (2023-05-10T09:46:53Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Improving Zero-shot Generalization in Offline Reinforcement Learning
using Generalized Similarity Functions [34.843526573355746]
Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but exhibit difficulty in generalizing to scenarios not seen during training.
We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.
We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior.
arXiv Detail & Related papers (2021-11-29T15:42:54Z) - SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual
Policies [87.78260740602674]
Generalization has been a long-standing challenge for reinforcement learning (RL)
In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift.
We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
arXiv Detail & Related papers (2021-06-17T17:28:18Z) - Robust Policies via Mid-Level Visual Representations: An Experimental
Study in Manipulation and Navigation [115.4071729927011]
We study the effects of using mid-level visual representations as generic and easy-to-decode perceptual state in an end-to-end RL framework.
We show that they aid generalization, improve sample complexity, and lead to a higher final performance.
In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed.
arXiv Detail & Related papers (2020-11-13T00:16:05Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Automatic Data Augmentation for Generalization in Deep Reinforcement
Learning [39.477038093585726]
Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios.
Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents.
We show that our agent learns policies and representations that are more robust to changes in the environment that do not affect the agent.
arXiv Detail & Related papers (2020-06-23T09:50:22Z) - Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.