Self-Supervised Curriculum Generation for Autonomous Reinforcement
Learning without Task-Specific Knowledge
- URL: http://arxiv.org/abs/2311.09195v2
- Date: Sun, 18 Feb 2024 12:39:35 GMT
- Title: Self-Supervised Curriculum Generation for Autonomous Reinforcement
Learning without Task-Specific Knowledge
- Authors: Sang-Hyun Lee and Seung-Woo Seo
- Abstract summary: A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode.
We propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge.
- Score: 25.168236693829783
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A significant bottleneck in applying current reinforcement learning
algorithms to real-world scenarios is the need to reset the environment between
every episode. This reset process demands substantial human intervention,
making it difficult for the agent to learn continuously and autonomously.
Several recent works have introduced autonomous reinforcement learning (ARL)
algorithms that generate curricula for jointly training reset and forward
policies. While their curricula can reduce the number of required manual resets
by taking into account the agent's learning progress, they rely on
task-specific knowledge, such as predefined initial states or reset reward
functions. In this paper, we propose a novel ARL algorithm that can generate a
curriculum adaptive to the agent's learning progress without task-specific
knowledge. Our curriculum empowers the agent to autonomously reset to diverse
and informative initial states. To achieve this, we introduce a success
discriminator that estimates the success probability from each initial state
when the agent follows the forward policy. The success discriminator is trained
with relabeled transitions in a self-supervised manner. Our experimental
results demonstrate that our ARL algorithm can generate an adaptive curriculum
and enable the agent to efficiently bootstrap to solve sparse-reward maze
navigation and manipulation tasks, outperforming baselines with significantly
fewer manual resets.
Related papers
- Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - Reward-Machine-Guided, Self-Paced Reinforcement Learning [30.42334205249944]
We develop a self-paced reinforcement learning algorithm guided by reward machines.
The proposed algorithm achieves optimal behavior reliably even in cases in which existing baselines cannot make any meaningful progress.
It also decreases the curriculum length and reduces the variance in the curriculum generation process by up to one-fourth and four orders of magnitude, respectively.
arXiv Detail & Related papers (2023-05-25T22:13:37Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - Automating Reinforcement Learning with Example-based Resets [19.86233948960312]
Existing reinforcement learning algorithms assume an episodic setting in which the agent resets to a fixed initial state distribution at the end of each episode.
We propose an extension to conventional reinforcement learning towards greater autonomy by introducing an additional agent that learns to reset in a self-supervised manner.
We apply our method to learn from scratch on a suite of simulated and real-world continuous control tasks and demonstrate that the reset agent successfully learns to reduce manual resets.
arXiv Detail & Related papers (2022-04-05T08:12:42Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Self-Paced Deep Reinforcement Learning [42.467323141301826]
Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning.
Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given reinforcement learning (RL) agent, avoiding manual design.
We propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task.
This approach leads to an automatic curriculum generation, whose pace is controlled by the agent, with solid theoretical motivation and easily integrated with deep RL algorithms.
arXiv Detail & Related papers (2020-04-24T15:48:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.