Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance
- URL: http://arxiv.org/abs/2507.06615v1
- Date: Wed, 09 Jul 2025 07:36:28 GMT
- Title: Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance
- Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng,
- Abstract summary: We present a novel framework called Cross-Task Policy Guidance (CTPG)<n>CTPG trains a guide policy for each task to select the behavior policy interacting with the environment from all tasks' control policies.<n>In addition, we propose two gating mechanisms to improve the learning efficiency of locomotionG: one gate filters out control policies that are not beneficial for guidance, while the other gate blocks tasks that do not necessitate guidance.
- Score: 25.18006424626525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task reinforcement learning endeavors to efficiently leverage shared information across various tasks, facilitating the simultaneous learning of multiple tasks. Existing approaches primarily focus on parameter sharing with carefully designed network structures or tailored optimization procedures. However, they overlook a direct and complementary way to exploit cross-task similarities: the control policies of tasks already proficient in some skills can provide explicit guidance for unmastered tasks to accelerate skills acquisition. To this end, we present a novel framework called Cross-Task Policy Guidance (CTPG), which trains a guide policy for each task to select the behavior policy interacting with the environment from all tasks' control policies, generating better training trajectories. In addition, we propose two gating mechanisms to improve the learning efficiency of CTPG: one gate filters out control policies that are not beneficial for guidance, while the other gate blocks tasks that do not necessitate guidance. CTPG is a general framework adaptable to existing parameter sharing approaches. Empirical evaluations demonstrate that incorporating CTPG with these approaches significantly enhances performance in manipulation and locomotion benchmarks.
Related papers
- Multi-Task Reinforcement Learning for Quadrotors [18.71563817810032]
This paper presents a novel multi-task reinforcement learning (MTRL) framework tailored for quadrotor control.<n>By employing a multi-critic architecture and shared task encoders, our framework facilitates knowledge transfer across tasks, enabling a single policy to execute diverse maneuvers.
arXiv Detail & Related papers (2024-12-17T01:10:18Z) - Task-Aware Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [70.96345405979179]
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction.
variations in task content and complexity pose significant challenges in policy formulation.
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
arXiv Detail & Related papers (2024-11-02T05:49:14Z) - Active Fine-Tuning of Multi-Task Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget.<n>We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z) - HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z) - Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction [10.388605128396678]
Task-Specific Action Correction is designed for simultaneous learning of multiple tasks.
ACP incorporates goal-oriented sparse rewards, enabling an agent to adopt a long-term perspective.
Additional rewards transform the original problem into a multi-objective MTRL problem.
arXiv Detail & Related papers (2024-04-09T02:11:35Z) - Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement
Learning with Dynamic Depth Routing [26.44273671379482]
Multi-task reinforcement learning endeavors to accomplish a set of different tasks with a single policy.
This work presents a Dynamic Depth Routing (D2R) framework, which learns strategic skipping of certain intermediate modules, thereby flexibly choosing different numbers of modules for each task.
In addition, we design an automatic route-balancing mechanism to encourage continued routing exploration for unmastered tasks without disturbing the routing of mastered ones.
arXiv Detail & Related papers (2023-12-22T06:51:30Z) - QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing [18.127823952220123]
Multi-task reinforcement learning (MTRL) aims to learn several tasks simultaneously for better sample efficiency than learning them separately.<n>We introduce a new framework for sharing behavioral policies across tasks, which can be used in addition to existing MTRL methods.
arXiv Detail & Related papers (2023-02-01T18:58:20Z) - Continual Vision-based Reinforcement Learning with Group Symmetries [18.7526848176769]
We introduce a unique Continual Vision-based Reinforcement Learning method that recognizes Group Symmetries, called COVERS.
Our results show that COVERS accurately assigns tasks to their respective groups and significantly outperforms existing methods in terms of generalization capability.
arXiv Detail & Related papers (2022-10-21T23:41:02Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.