Reinforcement Learning with Success Induced Task Prioritization
- URL: http://arxiv.org/abs/2301.00691v1
- Date: Fri, 30 Dec 2022 12:32:43 GMT
- Title: Reinforcement Learning with Success Induced Task Prioritization
- Authors: Maria Nesterova, Alexey Skrynnik, Aleksandr Panov
- Abstract summary: We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many challenging reinforcement learning (RL) problems require designing a
distribution of tasks that can be applied to train effective policies. This
distribution of tasks can be specified by the curriculum. A curriculum is meant
to improve the results of learning and accelerate it. We introduce Success
Induced Task Prioritization (SITP), a framework for automatic curriculum
learning, where a task sequence is created based on the success rate of each
task. In this setting, each task is an algorithmically created environment
instance with a unique configuration. The algorithm selects the order of tasks
that provide the fastest learning for agents. The probability of selecting any
of the tasks for the next stage of learning is determined by evaluating its
performance score in previous stages. Experiments were carried out in the
Partially Observable Grid Environment for Multiple Agents (POGEMA) and Procgen
benchmark. We demonstrate that SITP matches or surpasses the results of other
curriculum design methods. Our method can be implemented with handful of minor
modifications to any standard RL framework and provides useful prioritization
with minimal computational overhead.
Related papers
- Proximal Curriculum with Task Correlations for Deep Reinforcement Learning [25.10619062353793]
We consider curriculum design in contextual multi-task settings where the agent's final performance is measured w.r.t. a target distribution over complex tasks.
We propose a novel curriculum, ProCuRL-Target, that effectively balances the need for selecting tasks that are not too difficult for the agent while progressing the agent's learning toward the target distribution via leveraging task correlations.
arXiv Detail & Related papers (2024-05-03T21:07:54Z) - Sample Efficient Reinforcement Learning by Automatically Learning to
Compose Subtasks [3.1594865504808944]
We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks.
We evaluate our algorithm in a variety of sparse-reward environments.
arXiv Detail & Related papers (2024-01-25T15:06:40Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Active Task Randomization: Learning Robust Skills via Unsupervised
Generation of Diverse and Feasible Tasks [37.73239471412444]
We introduce Active Task Randomization (ATR), an approach that learns robust skills through the unsupervised generation of training tasks.
ATR selects suitable tasks, which consist of an initial environment state and manipulation goal, for learning robust skills by balancing the diversity and feasibility of the tasks.
We demonstrate that the learned skills can be composed by a task planner to solve unseen sequential manipulation problems based on visual inputs.
arXiv Detail & Related papers (2022-11-11T11:24:55Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Selecting task with optimal transport self-supervised learning for
few-shot classification [15.088213168796772]
Few-Shot classification aims at solving problems that only a few samples are available in the training process.
We propose a novel task selecting algorithm, named Optimal Transport Task Selecting (OTTS), to construct a training set by selecting similar tasks for Few-Shot learning.
OTTS measures the task similarity by calculating the optimal transport distance and completes the model training via a self-supervised strategy.
arXiv Detail & Related papers (2022-04-01T08:45:29Z) - Efficiently Identifying Task Groupings for Multi-Task Learning [55.80489920205404]
Multi-task learning can leverage information learned by one task to benefit the training of other tasks.
We suggest an approach to select which tasks should train together in multi-task learning models.
Our method determines task groupings in a single training run by co-training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss.
arXiv Detail & Related papers (2021-09-10T02:01:43Z) - Adaptive Procedural Task Generation for Hard-Exploration Problems [78.20918366839399]
We introduce Adaptive Procedural Task Generation (APT-Gen) to facilitate reinforcement learning in hard-exploration problems.
At the heart of our approach is a task generator that learns to create tasks from a parameterized task space via a black-box procedural generation module.
To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.
arXiv Detail & Related papers (2020-07-01T09:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.