QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
- URL: http://arxiv.org/abs/2302.00671v2
- Date: Mon, 07 Oct 2024 10:04:28 GMT
- Title: QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
- Authors: Grace Zhang, Ayush Jain, Injune Hwang, Shao-Hua Sun, Joseph J. Lim,
- Abstract summary: Multi-task reinforcement learning (MTRL) aims to learn several tasks simultaneously for better sample efficiency than learning them separately.
We introduce a new framework for sharing behavioral policies across tasks, which can be used in addition to existing MTRL methods.
- Score: 18.127823952220123
- License:
- Abstract: Multi-task reinforcement learning (MTRL) aims to learn several tasks simultaneously for better sample efficiency than learning them separately. Traditional methods achieve this by sharing parameters or relabeled data between tasks. In this work, we introduce a new framework for sharing behavioral policies across tasks, which can be used in addition to existing MTRL methods. The key idea is to improve each task's off-policy data collection by employing behaviors from other task policies. Selectively sharing helpful behaviors acquired in one task to collect training data for another task can lead to higher-quality trajectories, leading to more sample-efficient MTRL. Thus, we introduce a simple and principled framework called Q-switch mixture of policies (QMP) that selectively shares behavior between different task policies by using the task's Q-function to evaluate and select useful shareable behaviors. We theoretically analyze how QMP improves the sample efficiency of the underlying RL algorithm. Our experiments show that QMP's behavioral policy sharing provides complementary gains over many popular MTRL algorithms and outperforms alternative ways to share behaviors in various manipulation, locomotion, and navigation environments. Videos are available at https://qmp-mtrl.github.io.
Related papers
- Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z) - DynaShare: Task and Instance Conditioned Parameter Sharing for
Multi-Task Learning [11.955637263520492]
We present a novel parameter sharing method for multi-task learning.
We propose to dynamically decide which parts of the network to activate based on both the task and the input instance.
Our approach learns a hierarchical gating policy consisting of a task-specific policy for coarse layer selection and gating units for individual input instances.
arXiv Detail & Related papers (2023-05-26T23:43:21Z) - Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots [11.533449955841968]
We propose Q-Pensieve, a policy improvement scheme that stores a collection of Q-snapshots to jointly determine the policy update direction.
We show that Q-Pensieve can be naturally integrated with soft policy iteration with convergence guarantee.
arXiv Detail & Related papers (2022-12-06T16:29:47Z) - Hypernetworks for Zero-shot Transfer in Reinforcement Learning [21.994654567458017]
Hypernetworks are trained to generate behaviors across a range of unseen task conditions.
This work relates to meta RL, contextual RL, and transfer learning.
Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.
arXiv Detail & Related papers (2022-11-28T15:48:35Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - An Analysis Of Entire Space Multi-Task Models For Post-Click Conversion
Prediction [3.2979460528864926]
We consider approximating the probability of post-click conversion events (installs) for mobile app advertising on a large-scale advertising platform.
We show that several different approaches result in similar levels of positive transfer from the data-abundant CTR task to the CVR task.
Our findings add to the growing body of evidence suggesting that standard multi-task learning is a sensible approach to modelling related events in real-world large-scale applications.
arXiv Detail & Related papers (2021-08-18T13:39:50Z) - Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot.
We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control.
We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - TempLe: Learning Template of Transitions for Sample Efficient Multi-task
RL [18.242904106537654]
TempLe is the first PAC-MDP method for multi-task reinforcement learning.
We present two algorithms for an "online" and a "finite-model" setting respectively.
We prove that our proposed TempLe algorithms achieve much lower sample complexity than single-task learners or state-of-the-art multi-task methods.
arXiv Detail & Related papers (2020-02-16T19:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.