Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum
- URL: http://arxiv.org/abs/2305.09943v2
- Date: Fri, 9 Jun 2023 01:36:18 GMT
- Title: Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum
- Authors: Jigang Kim, Daesol Cho, H. Jin Kim
- Abstract summary: We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC)
With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
- Score: 22.32327908453603
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While reinforcement learning (RL) has achieved great success in acquiring
complex skills solely from environmental interactions, it assumes that resets
to the initial state are readily available at the end of each episode. Such an
assumption hinders the autonomous learning of embodied agents due to the
time-consuming and cumbersome workarounds for resetting in the physical world.
Hence, there has been a growing interest in autonomous RL (ARL) methods that
are capable of learning from non-episodic interactions. However, existing works
on ARL are limited by their reliance on prior data and are unable to learn in
environments where task-relevant interactions are sparse. In contrast, we
propose a demonstration-free ARL algorithm via Implicit and Bi-directional
Curriculum (IBC). With an auxiliary agent that is conditionally activated upon
learning progress and a bidirectional goal curriculum based on optimal
transport, our method outperforms previous methods, even the ones that leverage
demonstrations.
Related papers
- Social Interpretable Reinforcement Learning [4.242435932138821]
Social Interpretable RL (SIRL) is inspired by social learning principles to improve learning efficiency.
Our results on six well-known benchmarks show that SIRL reaches state-of-the-art performance.
arXiv Detail & Related papers (2024-01-27T19:05:21Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Human AI interaction loop training: New approach for interactive
reinforcement learning [0.0]
Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function.
RL presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards.
Imitation Learning (IL) offers a promising solution for those challenges using a teacher.
arXiv Detail & Related papers (2020-03-09T15:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.