On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement
Learning
- URL: http://arxiv.org/abs/2210.10763v2
- Date: Thu, 15 Jun 2023 17:57:18 GMT
- Title: On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement
Learning
- Authors: Yifan Xu, Nicklas Hansen, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen
Tu
- Abstract summary: We investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster.
We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models.
- Score: 45.73223325256312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) algorithms can solve challenging control problems
directly from image observations, but they often require millions of
environment interactions to do so. Recently, model-based RL algorithms have
greatly improved sample-efficiency by concurrently learning an internal model
of the world, and supplementing real environment interactions with imagined
rollouts for policy improvement. However, learning an effective model of the
world from scratch is challenging, and in stark contrast to humans that rely
heavily on world understanding and visual cues for learning new skills. In this
work, we investigate whether internal models learned by modern model-based RL
algorithms can be leveraged to solve new, distinctly different tasks faster. We
propose Model-Based Cross-Task Transfer (XTRA), a framework for
sample-efficient online RL with scalable pretraining and finetuning of learned
world models. By offline multi-task pretraining and online cross-task
finetuning, we achieve substantial improvements over a baseline trained from
scratch; we improve mean performance of model-based algorithm EfficientZero by
23%, and by as much as 71% in some instances.
Related papers
- Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling [0.0]
One promising approach is to train an RL agent as an improvement, starting with a suboptimal solution that is iteratively improved by applying small changes.
We apply this approach to a real-world multiobjective production scheduling problem.
We benchmarked our approach against other approaches using real data from our industry partner, demonstrating its superior performance.
arXiv Detail & Related papers (2024-09-18T12:48:56Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult.
offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction.
In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z) - HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - Simplified Temporal Consistency Reinforcement Learning [19.814047499837084]
We show that a simple representation learning approach relying on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL.
Our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
arXiv Detail & Related papers (2023-06-15T19:37:43Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Offline Q-Learning on Diverse Multi-Task Data Both Scales And
Generalizes [100.69714600180895]
offline Q-learning algorithms exhibit strong performance that scales with model capacity.
We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks.
Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
arXiv Detail & Related papers (2022-11-28T08:56:42Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Improved Context-Based Offline Meta-RL with Attention and Contrastive
Learning [1.3106063755117399]
We improve upon one of the SOTA OMRL algorithms, FOCAL, by incorporating intra-task attention mechanism and inter-task contrastive learning objectives.
Theoretical analysis and experiments are presented to demonstrate the superior performance, efficiency and robustness of our end-to-end and model free method.
arXiv Detail & Related papers (2021-02-22T05:05:16Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.