Learning Synthetic Environments and Reward Networks for Reinforcement
Learning
- URL: http://arxiv.org/abs/2202.02790v1
- Date: Sun, 6 Feb 2022 14:55:59 GMT
- Title: Learning Synthetic Environments and Reward Networks for Reinforcement
Learning
- Authors: Fabio Ferreira and Thomas Nierhoff and Andreas Saelinger and Frank
Hutter
- Abstract summary: We introduce Synthetic Environments (SEs) and Reward Networks (RNs) as proxy environment models for training Reinforcement Learning (RL) agents.
We show that an agent, after being trained exclusively on the SE, is able to solve the corresponding real environment.
- Score: 34.01695320809796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Synthetic Environments (SEs) and Reward Networks (RNs),
represented by neural networks, as proxy environment models for training
Reinforcement Learning (RL) agents. We show that an agent, after being trained
exclusively on the SE, is able to solve the corresponding real environment.
While an SE acts as a full proxy to a real environment by learning about its
state dynamics and rewards, an RN is a partial proxy that learns to augment or
replace rewards. We use bi-level optimization to evolve SEs and RNs: the inner
loop trains the RL agent, and the outer loop trains the parameters of the SE /
RN via an evolution strategy. We evaluate our proposed new concept on a broad
range of RL algorithms and classic control environments. In a one-to-one
comparison, learning an SE proxy requires more interactions with the real
environment than training agents only on the real environment. However, once
such an SE has been learned, we do not need any interactions with the real
environment to train new agents. Moreover, the learned SE proxies allow us to
train agents with fewer interactions while maintaining the original task
performance. Our empirical results suggest that SEs achieve this result by
learning informed representations that bias the agents towards relevant states.
Moreover, we find that these proxies are robust against hyperparameter
variation and can also transfer to unseen agents.
Related papers
- REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments [20.826907313227323]
Building generalist agents that can rapidly adapt to new environments is a key challenge for deploying AI in the digital and real worlds.
We propose a novel approach to pre-train relatively small policies on relatively small datasets and adapt them to unseen environments via in-context learning.
Our key idea is that retrieval offers a powerful bias for fast adaptation.
arXiv Detail & Related papers (2024-12-06T03:54:55Z) - Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models.
The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers.
We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Multi-Agent Transfer Learning in Reinforcement Learning-Based
Ride-Sharing Systems [3.7311680121118345]
Reinforcement learning (RL) has been used in a range of simulated real-world tasks.
In this paper we investigate the impact of TL transfer parameters with fixed source and target roles.
arXiv Detail & Related papers (2021-12-01T11:23:40Z) - What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm"
We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Learning Synthetic Environments for Reinforcement Learning with
Evolution Strategies [34.13101380723782]
This work explores learning agent-agnostic synthetic environments (SEs) for Reinforcement Learning.
SEs act as a proxy for target environments and allow agents to be trained more efficiently than when directly trained on the target environment.
We show that our method is capable of learning SEs for two discrete-action-space tasks that allow us to train agents more robustly and with up to 60% fewer steps.
arXiv Detail & Related papers (2021-01-24T14:16:13Z) - Emergent Social Learning via Multi-agent Reinforcement Learning [91.57176641192771]
Social learning is a key component of human and animal intelligence.
This paper investigates whether independent reinforcement learning agents can learn to use social learning to improve their performance.
arXiv Detail & Related papers (2020-10-01T17:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.