Task Relabelling for Multi-task Transfer using Successor Features
- URL: http://arxiv.org/abs/2205.10175v1
- Date: Fri, 20 May 2022 13:29:29 GMT
- Title: Task Relabelling for Multi-task Transfer using Successor Features
- Authors: Martin Balla and Diego Perez-Liebana
- Abstract summary: Successor Features (SFs) proposes a mechanism that allows learning policies that are not tied to any particular reward function.
In this work we investigate how SFs may be pre-trained without observing any reward in a custom environment that features resource collection, traps and crafting.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Reinforcement Learning has been very successful recently with various
works on complex domains. Most works are concerned with learning a single
policy that solves the target task, but is fixed in the sense that if the
environment changes the agent is unable to adapt to it. Successor Features
(SFs) proposes a mechanism that allows learning policies that are not tied to
any particular reward function. In this work we investigate how SFs may be
pre-trained without observing any reward in a custom environment that features
resource collection, traps and crafting. After pre-training we expose the SF
agents to various target tasks and see how well they can transfer to new tasks.
Transferring is done without any further training on the SF agents, instead
just by providing a task vector. For training the SFs we propose a task
relabelling method which greatly improves the agent's performance.
Related papers
- Task Adaptation of Reinforcement Learning-based NAS Agents through Transfer Learning [0.0]
We assess the abilities of reinforcement learning agents to transfer between different tasks.
We find that pretraining an agent on one task benefits the performance of the agent in another task in all but 1 task.
We also show that the training procedure for an agent can be shortened significantly by pretraining it on another task.
arXiv Detail & Related papers (2024-12-02T12:00:27Z) - Combining Behaviors with the Successor Features Keyboard [55.983751286962985]
"Successor Features Keyboard" (SFK) enables transfer with discovered state-features and task encodings.
We achieve the first demonstration of transfer with SFs in a challenging 3D environment.
arXiv Detail & Related papers (2023-10-24T15:35:54Z) - Reinforcement learning with Demonstrations from Mismatched Task under
Sparse Reward [7.51772160511614]
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems.
Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task.
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
Existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards.
arXiv Detail & Related papers (2022-12-03T02:24:59Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Multi-Agent Policy Transfer via Task Relationship Modeling [28.421365805638953]
We try to discover and exploit common structures among tasks for more efficient transfer.
We propose to learn effect-based task representations as a common space of tasks, using an alternatively fixed training scheme.
As a result, the proposed method can help transfer learned cooperation knowledge to new tasks after training on a few source tasks.
arXiv Detail & Related papers (2022-03-09T01:49:21Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Latent Skill Planning for Exploration and Transfer [49.25525932162891]
In this paper, we investigate how these two approaches can be integrated into a single reinforcement learning agent.
We leverage the idea of partial amortization for fast adaptation at test time.
We demonstrate the benefits of our design decisions across a suite of challenging locomotion tasks.
arXiv Detail & Related papers (2020-11-27T18:40:03Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.