Knowledge Transfer in Multi-Task Deep Reinforcement Learning for
Continuous Control
- URL: http://arxiv.org/abs/2010.07494v2
- Date: Fri, 16 Oct 2020 14:34:32 GMT
- Title: Knowledge Transfer in Multi-Task Deep Reinforcement Learning for
Continuous Control
- Authors: Zhiyuan Xu, Kun Wu, Zhengping Che, Jian Tang, Jieping Ye
- Abstract summary: We present a Knowledge Transfer based Multi-task Deep Reinforcement Learning framework (KTM-DRL) for continuous control.
In KTM-DRL, the multi-task agent first leverages an offline knowledge transfer algorithm to quickly learn a control policy from the experience of task-specific teachers.
The experimental results well justify the effectiveness of KTM-DRL and its knowledge transfer and online learning algorithms, as well as its superiority over the state-of-the-art by a large margin.
- Score: 65.00425082663146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Deep Reinforcement Learning (DRL) has emerged as a promising approach
to many complex tasks, it remains challenging to train a single DRL agent that
is capable of undertaking multiple different continuous control tasks. In this
paper, we present a Knowledge Transfer based Multi-task Deep Reinforcement
Learning framework (KTM-DRL) for continuous control, which enables a single DRL
agent to achieve expert-level performance in multiple different tasks by
learning from task-specific teachers. In KTM-DRL, the multi-task agent first
leverages an offline knowledge transfer algorithm designed particularly for the
actor-critic architecture to quickly learn a control policy from the experience
of task-specific teachers, and then it employs an online learning algorithm to
further improve itself by learning from new online transition samples under the
guidance of those teachers. We perform a comprehensive empirical study with two
commonly-used benchmarks in the MuJoCo continuous control task suite. The
experimental results well justify the effectiveness of KTM-DRL and its
knowledge transfer and online learning algorithms, as well as its superiority
over the state-of-the-art by a large margin.
Related papers
- Continuous Control with Coarse-to-fine Reinforcement Learning [15.585706638252441]
We present a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner.
We introduce a concrete, value-based algorithm within the framework called Coarse-to-fine Q-Network (CQN)
CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.
arXiv Detail & Related papers (2024-07-10T16:04:08Z) - Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - Granger Causal Interaction Skill Chains [35.143372688036685]
Reinforcement Learning (RL) has demonstrated promising results in learning policies for complex tasks, but it often suffers from low sample efficiency and limited transferability.
We introduce the Chain of Interaction Skills (COInS) algorithm, which focuses on controllability factored in domains to identify a small number of task-agnostic skills that still permit a high degree of control.
We also demonstrate the transferability of skills learned by COInS, using variants of Breakout, a common RL benchmark, and show 2-3x improvement in both sample efficiency and final performance compared to standard RL baselines.
arXiv Detail & Related papers (2023-06-15T21:06:54Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical
Multi-Step Approach for Policy Training [4.982806898121435]
We propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method.
This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration.
The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
arXiv Detail & Related papers (2022-09-29T00:42:44Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - URLB: Unsupervised Reinforcement Learning Benchmark [82.36060735454647]
We introduce the Unsupervised Reinforcement Learning Benchmark (URLB)
URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards.
We provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods.
arXiv Detail & Related papers (2021-10-28T15:07:01Z) - Hierarchical Program-Triggered Reinforcement Learning Agents For
Automated Driving [5.404179497338455]
Recent advances in Reinforcement Learning (RL) combined with Deep Learning (DL) have demonstrated impressive performance in complex tasks, including autonomous driving.
We propose HPRL - Hierarchical Program-triggered Reinforcement Learning, which uses a hierarchy consisting of a structured program along with multiple RL agents, each trained to perform a relatively simple task.
The focus of verification shifts to the master program under simple guarantees from the RL agents, leading to a significantly more interpretable and verifiable implementation as compared to a complex RL agent.
arXiv Detail & Related papers (2021-03-25T14:19:54Z) - Improved Context-Based Offline Meta-RL with Attention and Contrastive
Learning [1.3106063755117399]
We improve upon one of the SOTA OMRL algorithms, FOCAL, by incorporating intra-task attention mechanism and inter-task contrastive learning objectives.
Theoretical analysis and experiments are presented to demonstrate the superior performance, efficiency and robustness of our end-to-end and model free method.
arXiv Detail & Related papers (2021-02-22T05:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.