Related papers: Bootstrap State Representation using Style Transfer for Better Generalization in Deep Reinforcement Learning

Bootstrap State Representation using Style Transfer for Better Generalization in Deep Reinforcement Learning

URL: http://arxiv.org/abs/2207.07749v1
Date: Fri, 15 Jul 2022 20:49:45 GMT
Title: Bootstrap State Representation using Style Transfer for Better Generalization in Deep Reinforcement Learning
Authors: Md Masudur Rahman and Yexiang Xue
Abstract summary: Thinker is a bootstrapping method to remove adversarial effects of confounding features from the observation in an unsupervised way. Thinker has wide applicability among many Deep Reinforcement Learning settings.
Score: 16.999444076456268
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Reinforcement Learning (RL) agents often overfit the training environment, leading to poor generalization performance. In this paper, we propose Thinker, a bootstrapping method to remove adversarial effects of confounding features from the observation in an unsupervised way, and thus, it improves RL agents' generalization. Thinker first clusters experience trajectories into several clusters. These trajectories are then bootstrapped by applying a style transfer generator, which translates the trajectories from one cluster's style to another while maintaining the content of the observations. The bootstrapped trajectories are then used for policy learning. Thinker has wide applicability among many RL settings. Experimental results reveal that Thinker leads to better generalization capability in the Procgen benchmark environments compared to base algorithms and several data augmentation techniques.

Related papers

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges. While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z)
Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better [0.3021678014343889]
Supervised learning (SL) and reinforcement learning (RL) are widely used to train general-purpose agents for complex tasks. This paper provides a direct comparison between SL and RL in terms of zero-shot generalization.
arXiv Detail & Related papers (2025-03-19T21:03:27Z)
SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning [0.6668116630521236]
This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL) We propose five strategies to design information-dense observations, focusing on general features that are applicable to most traffic scenarios. We train our RL agents using these strategies on an intersection and evaluate their generalization through numerical experiments across completely unseen traffic scenarios, including a new intersection, an on-ramp, and a roundabout.
arXiv Detail & Related papers (2024-08-14T16:16:51Z)
RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization [23.417092819516185]
We introduce RL-ViGen: a novel Reinforcement Learning Benchmark for Visual Generalization. RL-ViGen contains diverse tasks and a wide spectrum of generalization types, thereby facilitating the derivation of more reliable conclusions. Our aspiration is that RL-ViGen will serve as a catalyst in the future creation of universal visual generalization RL agents.
arXiv Detail & Related papers (2023-07-15T05:45:37Z)
Supplementing Gradient-Based Reinforcement Learning with Simple Evolutionary Ideas [4.873362301533824]
We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL) The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space.
arXiv Detail & Related papers (2023-05-10T09:46:53Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies [87.78260740602674]
Generalization has been a long-standing challenge for reinforcement learning (RL) In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
arXiv Detail & Related papers (2021-06-17T17:28:18Z)
Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation [115.4071729927011]
We study the effects of using mid-level visual representations as generic and easy-to-decode perceptual state in an end-to-end RL framework. We show that they aid generalization, improve sample complexity, and lead to a higher final performance. In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed.
arXiv Detail & Related papers (2020-11-13T00:16:05Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Automatic Data Augmentation for Generalization in Deep Reinforcement Learning [39.477038093585726]
Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios. Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents. We show that our agent learns policies and representations that are more robust to changes in the environment that do not affect the agent.
arXiv Detail & Related papers (2020-06-23T09:50:22Z)
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.