Task Phasing: Automated Curriculum Learning from Demonstrations
- URL: http://arxiv.org/abs/2210.10999v2
- Date: Tue, 28 Mar 2023 01:22:54 GMT
- Title: Task Phasing: Automated Curriculum Learning from Demonstrations
- Authors: Vaibhav Bajaj, Guni Sharon, Peter Stone
- Abstract summary: Applying reinforcement learning to sparse reward domains is notoriously challenging due to insufficient guiding signals.
This paper introduces a principled task phasing approach that uses demonstrations to automatically generate a curriculum sequence.
Experimental results on 3 sparse reward domains demonstrate that our task phasing approaches outperform state-of-the-art approaches with respect to performance.
- Score: 46.1680279122598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applying reinforcement learning (RL) to sparse reward domains is notoriously
challenging due to insufficient guiding signals. Common RL techniques for
addressing such domains include (1) learning from demonstrations and (2)
curriculum learning. While these two approaches have been studied in detail,
they have rarely been considered together. This paper aims to do so by
introducing a principled task phasing approach that uses demonstrations to
automatically generate a curriculum sequence. Using inverse RL from
(suboptimal) demonstrations we define a simple initial task. Our task phasing
approach then provides a framework to gradually increase the complexity of the
task all the way to the target task, while retuning the RL agent in each
phasing iteration. Two approaches for phasing are considered: (1) gradually
increasing the proportion of time steps an RL agent is in control, and (2)
phasing out a guiding informative reward function. We present conditions that
guarantee the convergence of these approaches to an optimal policy.
Experimental results on 3 sparse reward domains demonstrate that our task
phasing approaches outperform state-of-the-art approaches with respect to
asymptotic performance.
Related papers
- Optimal Task Order for Continual Learning of Multiple Tasks [3.591122855617648]
Continual learning of multiple tasks remains a major challenge for neural networks.
Here, we investigate how task order influences continual learning and propose a strategy for optimizing it.
Our work thus presents a generalizable framework for task-order optimization in task-incremental continual learning.
arXiv Detail & Related papers (2025-02-05T16:43:58Z) - Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas.
We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process.
Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z) - Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations [24.041217922654738]
Continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks.
Online RL methods can automatically explore the state space to solve each new task.
However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases.
We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations.
arXiv Detail & Related papers (2024-12-02T04:37:12Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - CLUTR: Curriculum Learning via Unsupervised Task Representation Learning [130.79246770546413]
CLUTR is a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization.
We show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments.
arXiv Detail & Related papers (2022-10-19T01:45:29Z) - Large Language Models can Implement Policy Iteration [18.424558160071808]
In-Context Policy Iteration is an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models.
ICPI learns to perform RL tasks without expert demonstrations or gradients.
ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment.
arXiv Detail & Related papers (2022-10-07T21:18:22Z) - Provable Benefit of Multitask Representation Learning in Reinforcement
Learning [46.11628795660159]
This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model.
To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
arXiv Detail & Related papers (2022-06-13T04:29:02Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.