Bootstrap State Representation using Style Transfer for Better
Generalization in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2207.07749v1
- Date: Fri, 15 Jul 2022 20:49:45 GMT
- Title: Bootstrap State Representation using Style Transfer for Better
Generalization in Deep Reinforcement Learning
- Authors: Md Masudur Rahman and Yexiang Xue
- Abstract summary: Thinker is a bootstrapping method to remove adversarial effects of confounding features from the observation in an unsupervised way.
Thinker has wide applicability among many Deep Reinforcement Learning settings.
- Score: 16.999444076456268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Reinforcement Learning (RL) agents often overfit the training
environment, leading to poor generalization performance. In this paper, we
propose Thinker, a bootstrapping method to remove adversarial effects of
confounding features from the observation in an unsupervised way, and thus, it
improves RL agents' generalization. Thinker first clusters experience
trajectories into several clusters. These trajectories are then bootstrapped by
applying a style transfer generator, which translates the trajectories from one
cluster's style to another while maintaining the content of the observations.
The bootstrapped trajectories are then used for policy learning. Thinker has
wide applicability among many RL settings. Experimental results reveal that
Thinker leads to better generalization capability in the Procgen benchmark
environments compared to base algorithms and several data augmentation
techniques.
Related papers
- SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning [0.6668116630521236]
This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL)
We propose five strategies to design information-dense observations, focusing on general features that are applicable to most traffic scenarios.
We train our RL agents using these strategies on an intersection and evaluate their generalization through numerical experiments across completely unseen traffic scenarios, including a new intersection, an on-ramp, and a roundabout.
arXiv Detail & Related papers (2024-08-14T16:16:51Z) - RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization [23.417092819516185]
We introduce RL-ViGen: a novel Reinforcement Learning Benchmark for Visual Generalization.
RL-ViGen contains diverse tasks and a wide spectrum of generalization types, thereby facilitating the derivation of more reliable conclusions.
Our aspiration is that RL-ViGen will serve as a catalyst in the future creation of universal visual generalization RL agents.
arXiv Detail & Related papers (2023-07-15T05:45:37Z) - Supplementing Gradient-Based Reinforcement Learning with Simple
Evolutionary Ideas [4.873362301533824]
We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL)
The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space.
arXiv Detail & Related papers (2023-05-10T09:46:53Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual
Policies [87.78260740602674]
Generalization has been a long-standing challenge for reinforcement learning (RL)
In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift.
We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
arXiv Detail & Related papers (2021-06-17T17:28:18Z) - Robust Policies via Mid-Level Visual Representations: An Experimental
Study in Manipulation and Navigation [115.4071729927011]
We study the effects of using mid-level visual representations as generic and easy-to-decode perceptual state in an end-to-end RL framework.
We show that they aid generalization, improve sample complexity, and lead to a higher final performance.
In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed.
arXiv Detail & Related papers (2020-11-13T00:16:05Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Automatic Data Augmentation for Generalization in Deep Reinforcement
Learning [39.477038093585726]
Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios.
Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents.
We show that our agent learns policies and representations that are more robust to changes in the environment that do not affect the agent.
arXiv Detail & Related papers (2020-06-23T09:50:22Z) - Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.