Learning Synthetic Environments for Reinforcement Learning with
Evolution Strategies
- URL: http://arxiv.org/abs/2101.09721v3
- Date: Mon, 8 Feb 2021 15:03:39 GMT
- Title: Learning Synthetic Environments for Reinforcement Learning with
Evolution Strategies
- Authors: Fabio Ferreira, Thomas Nierhoff, Frank Hutter
- Abstract summary: This work explores learning agent-agnostic synthetic environments (SEs) for Reinforcement Learning.
SEs act as a proxy for target environments and allow agents to be trained more efficiently than when directly trained on the target environment.
We show that our method is capable of learning SEs for two discrete-action-space tasks that allow us to train agents more robustly and with up to 60% fewer steps.
- Score: 34.13101380723782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work explores learning agent-agnostic synthetic environments (SEs) for
Reinforcement Learning. SEs act as a proxy for target environments and allow
agents to be trained more efficiently than when directly trained on the target
environment. We formulate this as a bi-level optimization problem and represent
an SE as a neural network. By using Natural Evolution Strategies and a
population of SE parameter vectors, we train agents in the inner loop on
evolving SEs while in the outer loop we use the performance on the target task
as a score for meta-updating the SE population. We show empirically that our
method is capable of learning SEs for two discrete-action-space tasks
(CartPole-v0 and Acrobot-v1) that allow us to train agents more robustly and
with up to 60% fewer steps. Not only do we show in experiments with 4000
evaluations that the SEs are robust against hyperparameter changes such as the
learning rate, batch sizes and network sizes, we also show that SEs trained
with DDQN agents transfer in limited ways to a discrete-action-space version of
TD3 and very well to Dueling DDQN.
Related papers
- Multi-agent Path Finding for Timed Tasks using Evolutionary Games [1.3023548510259344]
We show that our algorithm is faster than deep RL methods by at least an order of magnitude.
Our results indicate that it scales better with an increase in the number of agents as compared to other methods.
arXiv Detail & Related papers (2024-11-15T20:10:25Z) - No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Discovering Minimal Reinforcement Learning Environments [24.6408931194983]
Reinforcement learning (RL) agents are commonly trained and evaluated in the same environment.
Humans often train in a specialized environment before being evaluated, such as studying a book before taking an exam.
arXiv Detail & Related papers (2024-06-18T13:19:26Z) - DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design [11.922951794283168]
In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents.
We discover that for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data.
We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance.
To prevent both overfitting and distributional shift, we introduce data-regularised environment design (D
arXiv Detail & Related papers (2024-02-05T19:47:45Z) - Learning Neuro-Symbolic Skills for Bilevel Planning [63.388694268198655]
Decision-making is challenging in robotics environments with continuous object-centric states, continuous actions, long horizons, and sparse feedback.
Hierarchical approaches, such as task and motion planning (TAMP), address these challenges by decomposing decision-making into two or more levels of abstraction.
Our main contribution is a method for learning parameterized polices in combination with operators and samplers.
arXiv Detail & Related papers (2022-06-21T19:01:19Z) - Learning Synthetic Environments and Reward Networks for Reinforcement
Learning [34.01695320809796]
We introduce Synthetic Environments (SEs) and Reward Networks (RNs) as proxy environment models for training Reinforcement Learning (RL) agents.
We show that an agent, after being trained exclusively on the SE, is able to solve the corresponding real environment.
arXiv Detail & Related papers (2022-02-06T14:55:59Z) - Learning Connectivity-Maximizing Network Configurations [123.01665966032014]
We propose a supervised learning approach with a convolutional neural network (CNN) that learns to place communication agents from an expert.
We demonstrate the performance of our CNN on canonical line and ring topologies, 105k randomly generated test cases, and larger teams not seen during training.
After training, our system produces connected configurations 2 orders of magnitude faster than the optimization-based scheme for teams of 10-20 agents.
arXiv Detail & Related papers (2021-12-14T18:59:01Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning
and SLAM Based Approach [7.488722678999039]
We present a map-less path planning algorithm based on Deep Reinforcement Learning (DRL) for mobile robots navigating in unknown environment.
The planner is trained using a reward function shaped based on the online knowledge of the map of the training environment.
The policy trained in the simulation environment can be directly and successfully transferred to the real robot.
arXiv Detail & Related papers (2020-02-10T22:00:16Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.