Continual Task Allocation in Meta-Policy Network via Sparse Prompting
- URL: http://arxiv.org/abs/2305.18444v2
- Date: Sat, 3 Jun 2023 16:49:24 GMT
- Title: Continual Task Allocation in Meta-Policy Network via Sparse Prompting
- Authors: Yijun Yang, Tianyi Zhou, Jing Jiang, Guodong Long, Yuhui Shi
- Abstract summary: We show how to train a generalizable meta-policy by continually learning a sequence of tasks.
We address it by "Continual Task Allocation via Sparse Prompting (CoTASP)"
In experiments, CoTASP achieves a promising plasticity-stability trade-off without storing or replaying any past tasks' experiences.
- Score: 42.386912478509814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How to train a generalizable meta-policy by continually learning a sequence
of tasks? It is a natural human skill yet challenging to achieve by current
reinforcement learning: the agent is expected to quickly adapt to new tasks
(plasticity) meanwhile retaining the common knowledge from previous tasks
(stability). We address it by "Continual Task Allocation via Sparse Prompting
(CoTASP)", which learns over-complete dictionaries to produce sparse masks as
prompts extracting a sub-network for each task from a meta-policy network.
CoTASP trains a policy for each task by optimizing the prompts and the
sub-network weights alternatively. The dictionary is then updated to align the
optimized prompts with tasks' embedding, thereby capturing tasks' semantic
correlations. Hence, relevant tasks share more neurons in the meta-policy
network due to similar prompts while cross-task interference causing forgetting
is effectively restrained. Given a meta-policy and dictionaries trained on
previous tasks, new task adaptation reduces to highly efficient sparse
prompting and sub-network finetuning. In experiments, CoTASP achieves a
promising plasticity-stability trade-off without storing or replaying any past
tasks' experiences. It outperforms existing continual and multi-task RL methods
on all seen tasks, forgetting reduction, and generalization to unseen tasks.
Related papers
- Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Robust Subtask Learning for Compositional Generalization [20.54144051436337]
We focus on the problem of training subtask policies in a way that they can be used to perform any task.
We aim to maximize the worst-case performance over all tasks as opposed to the average-case performance.
arXiv Detail & Related papers (2023-02-06T18:19:25Z) - TaskMix: Data Augmentation for Meta-Learning of Spoken Intent
Understanding [0.0]
We show that a state-of-the-art data augmentation method worsens this problem of overfitting when the task diversity is low.
We propose a simple method, TaskMix, which synthesizes new tasks by linearly interpolating existing tasks.
We show that TaskMix outperforms baselines, alleviates overfitting when task diversity is low, and does not degrade performance even when it is high.
arXiv Detail & Related papers (2022-09-26T00:37:40Z) - Improving Task Generalization via Unified Schema Prompt [87.31158568180514]
Unified Prompt is a flexible and prompting method, which automatically customizes the learnable prompts for each task according to the task input schema.
It models the shared knowledge between tasks, while keeping the characteristics of different task schema.
The framework achieves strong zero-shot and few-shot performance on 16 unseen tasks downstream from 8 task types.
arXiv Detail & Related papers (2022-08-05T15:26:36Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - CoMPS: Continual Meta Policy Search [113.33157585319906]
We develop a new continual meta-learning method to address challenges in sequential multi-task learning.
We find that CoMPS outperforms prior continual learning and off-policy meta-reinforcement methods on several sequences of challenging continuous control tasks.
arXiv Detail & Related papers (2021-12-08T18:53:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.