Persistent Reinforcement Learning via Subgoal Curricula
- URL: http://arxiv.org/abs/2107.12931v1
- Date: Tue, 27 Jul 2021 16:39:45 GMT
- Title: Persistent Reinforcement Learning via Subgoal Curricula
- Authors: Archit Sharma, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea
Finn
- Abstract summary: Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
- Score: 114.83989499740193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) promises to enable autonomous acquisition of
complex behaviors for diverse agents. However, the success of current
reinforcement learning algorithms is predicated on an often under-emphasised
requirement -- each trial needs to start from a fixed initial state
distribution. Unfortunately, resetting the environment to its initial state
after each trial requires substantial amount of human supervision and extensive
instrumentation of the environment which defeats the purpose of autonomous
reinforcement learning. In this work, we propose Value-accelerated Persistent
Reinforcement Learning (VaPRL), which generates a curriculum of initial states
such that the agent can bootstrap on the success of easier tasks to efficiently
learn harder tasks. The agent also learns to reach the initial states proposed
by the curriculum, minimizing the reliance on human interventions into the
learning. We observe that VaPRL reduces the interventions required by three
orders of magnitude compared to episodic RL while outperforming prior
state-of-the art methods for reset-free RL both in terms of sample efficiency
and asymptotic performance on a variety of simulated robotics problems.
Related papers
- Single-Reset Divide & Conquer Imitation Learning [49.87201678501027]
Demonstrations are commonly used to speed up the learning process of Deep Reinforcement Learning algorithms.
Some algorithms have been developed to learn from a single demonstration.
arXiv Detail & Related papers (2024-02-14T17:59:47Z) - Self-Supervised Curriculum Generation for Autonomous Reinforcement
Learning without Task-Specific Knowledge [25.168236693829783]
A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode.
We propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge.
arXiv Detail & Related papers (2023-11-15T18:40:10Z) - Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum [22.32327908453603]
We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC)
With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
arXiv Detail & Related papers (2023-05-17T04:31:36Z) - You Only Live Once: Single-Life Reinforcement Learning [124.1738675154651]
In many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial.
We formalize this problem setting, where an agent must complete a task within a single episode without interventions.
We propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy.
arXiv Detail & Related papers (2022-10-17T09:00:11Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - Automating Reinforcement Learning with Example-based Resets [19.86233948960312]
Existing reinforcement learning algorithms assume an episodic setting in which the agent resets to a fixed initial state distribution at the end of each episode.
We propose an extension to conventional reinforcement learning towards greater autonomy by introducing an additional agent that learns to reset in a self-supervised manner.
We apply our method to learn from scratch on a suite of simulated and real-world continuous control tasks and demonstrate that the reset agent successfully learns to reduce manual resets.
arXiv Detail & Related papers (2022-04-05T08:12:42Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.