Task Aware Dreamer for Task Generalization in Reinforcement Learning
- URL: http://arxiv.org/abs/2303.05092v3
- Date: Fri, 2 Feb 2024 16:18:10 GMT
- Title: Task Aware Dreamer for Task Generalization in Reinforcement Learning
- Authors: Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Songming Liu,
Dong Yan, Jun Zhu
- Abstract summary: We show that training a general world model can utilize similar structures in tasks and help train more generalizable agents.
We introduce a novel method named Task Aware Dreamer (TAD), which integrates reward-informed features to identify latent characteristics across tasks.
Experiments in both image-based and state-based tasks show that TAD can significantly improve the performance of handling different tasks simultaneously.
- Score: 32.93706056123124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A long-standing goal of reinforcement learning is to acquire agents that can
learn on training tasks and generalize well on unseen tasks that may share a
similar dynamic but with different reward functions. The ability to generalize
across tasks is important as it determines an agent's adaptability to
real-world scenarios where reward mechanisms might vary. In this work, we first
show that training a general world model can utilize similar structures in
these tasks and help train more generalizable agents. Extending world models
into the task generalization setting, we introduce a novel method named Task
Aware Dreamer (TAD), which integrates reward-informed features to identify
consistent latent characteristics across tasks. Within TAD, we compute the
variational lower bound of sample data log-likelihood, which introduces a new
term designed to differentiate tasks using their states, as the optimization
objective of our reward-informed world models. To demonstrate the advantages of
the reward-informed policy in TAD, we introduce a new metric called Task
Distribution Relevance (TDR) which quantitatively measures the relevance of
different tasks. For tasks exhibiting a high TDR, i.e., the tasks differ
significantly, we illustrate that Markovian policies struggle to distinguish
them, thus it is necessary to utilize reward-informed policies in TAD.
Extensive experiments in both image-based and state-based tasks show that TAD
can significantly improve the performance of handling different tasks
simultaneously, especially for those with high TDR, and display a strong
generalization ability to unseen tasks.
Related papers
- Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Towards Task Sampler Learning for Meta-Learning [37.02030832662183]
Meta-learning aims to learn general knowledge with diverse training tasks conducted from limited data, and then transfer it to new tasks.
It is commonly believed that increasing task diversity will enhance the generalization ability of meta-learning models.
This paper challenges this view through empirical and theoretical analysis.
arXiv Detail & Related papers (2023-07-18T01:53:18Z) - Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts [75.75548749888029]
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks.
With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.
arXiv Detail & Related papers (2023-05-11T17:57:49Z) - Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code
Models [33.78307982736911]
Cross-task generalization is of strong research and application value.
We propose a large-scale benchmark that includes 216 existing code-related tasks.
arXiv Detail & Related papers (2023-02-08T13:04:52Z) - Improving Task Generalization via Unified Schema Prompt [87.31158568180514]
Unified Prompt is a flexible and prompting method, which automatically customizes the learnable prompts for each task according to the task input schema.
It models the shared knowledge between tasks, while keeping the characteristics of different task schema.
The framework achieves strong zero-shot and few-shot performance on 16 unseen tasks downstream from 8 task types.
arXiv Detail & Related papers (2022-08-05T15:26:36Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks.
We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.