Outcome-directed Reinforcement Learning by Uncertainty & Temporal
Distance-Aware Curriculum Goal Generation
- URL: http://arxiv.org/abs/2301.11741v1
- Date: Fri, 27 Jan 2023 14:25:04 GMT
- Title: Outcome-directed Reinforcement Learning by Uncertainty & Temporal
Distance-Aware Curriculum Goal Generation
- Authors: Daesol Cho, Seungjae Lee, H. Jin Kim
- Abstract summary: Current reinforcement learning (RL) often suffers when solving a challenging exploration problem where the desired outcomes or high rewards are rarely observed.
We propose an uncertainty & temporal distance-aware curriculum goal generation method for the outcome-directed RL via solving a bipartite matching problem.
It could not only provide precisely calibrated guidance of the curriculum to the desired outcome states but also bring much better sample efficiency and geometry-agnostic curriculum goal proposal capability compared to previous curriculum RL methods.
- Score: 29.155620517531656
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Current reinforcement learning (RL) often suffers when solving a challenging
exploration problem where the desired outcomes or high rewards are rarely
observed. Even though curriculum RL, a framework that solves complex tasks by
proposing a sequence of surrogate tasks, shows reasonable results, most of the
previous works still have difficulty in proposing curriculum due to the absence
of a mechanism for obtaining calibrated guidance to the desired outcome state
without any prior domain knowledge. To alleviate it, we propose an uncertainty
& temporal distance-aware curriculum goal generation method for the
outcome-directed RL via solving a bipartite matching problem. It could not only
provide precisely calibrated guidance of the curriculum to the desired outcome
states but also bring much better sample efficiency and geometry-agnostic
curriculum goal proposal capability compared to previous curriculum RL methods.
We demonstrate that our algorithm significantly outperforms these prior methods
in a variety of challenging navigation tasks and robotic manipulation tasks in
a quantitative and qualitative way.
Related papers
- Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning [17.092640837991883]
Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction.
One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data.
We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency.
arXiv Detail & Related papers (2024-05-06T11:33:12Z) - CQM: Curriculum Reinforcement Learning with a Quantized World Model [30.21954044028645]
We propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process.
Ours suggests uncertainty and temporal distance-aware curriculum goals that converge to the final goals over the automatically composed goal space.
Also, ours outperforms the state-of-the-art curriculum RL methods on data efficiency and performance, in various goal-reaching tasks even with ego-centric visual inputs.
arXiv Detail & Related papers (2023-10-26T11:50:58Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - A Probabilistic Interpretation of Self-Paced Learning with Applications
to Reinforcement Learning [30.69129405392038]
We present an approach for automated curriculum generation in reinforcement learning.
We formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks.
Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms.
arXiv Detail & Related papers (2021-02-25T21:06:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.