Open-Ended Reinforcement Learning with Neural Reward Functions
- URL: http://arxiv.org/abs/2202.08266v1
- Date: Wed, 16 Feb 2022 15:55:22 GMT
- Title: Open-Ended Reinforcement Learning with Neural Reward Functions
- Authors: Robert Meier and Asier Mujika
- Abstract summary: In high-dimensional robotic environments our approach learns a wide range of interesting skills including front-flips for Half-Cheetah and one-legged running for Humanoid.
In the pixel-based Montezuma's Revenge environment our method also works with minimal changes and it learns complex skills that involve interacting with items and visiting diverse locations.
- Score: 2.4366811507669115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the great success of unsupervised learning in Computer Vision and
Natural Language Processing, the Reinforcement Learning community has recently
started to focus more on unsupervised discovery of skills. Most current
approaches, like DIAYN or DADS, optimize some form of mutual information
objective. We propose a different approach that uses reward functions encoded
by neural networks. These are trained iteratively to reward more complex
behavior. In high-dimensional robotic environments our approach learns a wide
range of interesting skills including front-flips for Half-Cheetah and
one-legged running for Humanoid. In the pixel-based Montezuma's Revenge
environment our method also works with minimal changes and it learns complex
skills that involve interacting with items and visiting diverse locations. A
web version of this paper which shows animations for the different skills is
available in https://as.inf.ethz.ch/research/open_ended_RL/main.html
Related papers
- RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - Skill Reinforcement Learning and Planning for Open-World Long-Horizon
Tasks [31.084848672383185]
We study building multi-task agents in open-world environments.
We convert the multi-task learning problem into learning basic skills and planning over the skills.
Our method accomplishes 40 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills.
arXiv Detail & Related papers (2023-03-29T09:45:50Z) - Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination.
Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model.
Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z) - Lipschitz-constrained Unsupervised Skill Discovery [91.51219447057817]
Lipschitz-constrained Skill Discovery (LSD) encourages the agent to discover more diverse, dynamic, and far-reaching skills.
LSD outperforms previous approaches in terms of skill diversity, state space coverage, and performance on seven downstream tasks.
arXiv Detail & Related papers (2022-02-02T08:29:04Z) - Inducing Structure in Reward Learning by Learning Features [31.413656752926208]
We introduce a novel type of human input for teaching features and an algorithm that utilizes it to learn complex features from the raw state space.
We demonstrate our method in settings where all features have to be learned from scratch, as well as where some of the features are known.
arXiv Detail & Related papers (2022-01-18T16:02:29Z) - Actionable Models: Unsupervised Offline Reinforcement Learning of
Robotic Skills [93.12417203541948]
We propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset.
We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects.
arXiv Detail & Related papers (2021-04-15T20:10:11Z) - Learning Affordance Landscapes for Interaction Exploration in 3D
Environments [101.90004767771897]
Embodied agents must be able to master how their environment works.
We introduce a reinforcement learning approach for exploration for interaction.
We demonstrate our idea with AI2-iTHOR.
arXiv Detail & Related papers (2020-08-21T00:29:36Z) - ELSIM: End-to-end learning of reusable skills through intrinsic
motivation [0.0]
We present a novel reinforcement learning architecture which hierarchically learns and represents self-generated skills in an end-to-end way.
With this architecture, an agent focuses only on task-rewarded skills while keeping the learning process of skills bottom-up.
arXiv Detail & Related papers (2020-06-23T11:20:46Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z) - Learning as Reinforcement: Applying Principles of Neuroscience for More
General Reinforcement Learning Agents [1.0742675209112622]
We implement an architecture founded in principles of experimental neuroscience, by combining computationally efficient abstractions of biological algorithms.
Our approach is inspired by research on spike-timing dependent plasticity, the transition between short and long term memory, and the role of various neurotransmitters in rewarding curiosity.
The Neurons-in-a-Box architecture can learn in a wholly generalizable manner, and demonstrates an efficient way to build and apply representations without explicitly optimizing over a set of criteria or actions.
arXiv Detail & Related papers (2020-04-20T04:06:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.