Transferred Q-learning
- URL: http://arxiv.org/abs/2202.04709v1
- Date: Wed, 9 Feb 2022 20:08:19 GMT
- Title: Transferred Q-learning
- Authors: Elynn Y. Chen, Michael I. Jordan, Sai Li
- Abstract summary: We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks.
We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies.
- Score: 79.79659145328856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider $Q$-learning with knowledge transfer, using samples from a target
reinforcement learning (RL) task as well as source samples from different but
related RL tasks. We propose transfer learning algorithms for both batch and
online $Q$-learning with offline source studies. The proposed transferred
$Q$-learning algorithm contains a novel re-targeting step that enables vertical
information-cascading along multiple steps in an RL task, besides the usual
horizontal information-gathering as transfer learning (TL) for supervised
learning. We establish the first theoretical justifications of TL in RL tasks
by showing a faster rate of convergence of the $Q$ function estimation in the
offline RL transfer, and a lower regret bound in the offline-to-online RL
transfer under certain similarity assumptions. Empirical evidences from both
synthetic and real datasets are presented to back up the proposed algorithm and
our theoretical results.
Related papers
- Offline Multitask Representation Learning for Reinforcement Learning [86.26066704016056]
We study offline multitask representation learning in reinforcement learning (RL)
We propose a new algorithm called MORL for offline multitask representation learning.
Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
arXiv Detail & Related papers (2024-03-18T08:50:30Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - An advantage based policy transfer algorithm for reinforcement learning
with metrics of transferability [6.660458629649826]
Reinforcement learning (RL) can enable sequential decision-making in complex and high-dimensional environments.
transfer RL algorithms can be used for the transfer of knowledge from one or multiple source environments to a target environment.
This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments.
arXiv Detail & Related papers (2023-11-12T04:25:53Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Provable Benefit of Multitask Representation Learning in Reinforcement
Learning [46.11628795660159]
This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model.
To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
arXiv Detail & Related papers (2022-06-13T04:29:02Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Fractional Transfer Learning for Deep Model-Based Reinforcement Learning [0.966840768820136]
Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks.
Recent progress in model-based RL allows agents to be much more data-efficient.
We present a simple alternative approach: fractional transfer learning.
arXiv Detail & Related papers (2021-08-14T12:44:42Z) - Interpretable performance analysis towards offline reinforcement
learning: A dataset perspective [6.526790418943535]
We propose a two-fold taxonomy for existing offline RL algorithms.
We explore the correlation between the performance of different types of algorithms and the distribution of actions under states.
We create a benchmark platform on the Atari domain, entitled easy go (RLEG), at an estimated cost of more than 0.3 million dollars.
arXiv Detail & Related papers (2021-05-12T07:17:06Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.