Related papers: Model-Based Reinforcement Learning with Multi-Task Offline Pretraining

Model-Based Reinforcement Learning with Multi-Task Offline Pretraining

URL: http://arxiv.org/abs/2306.03360v3
Date: Wed, 5 Jun 2024 10:41:25 GMT
Title: Model-Based Reinforcement Learning with Multi-Task Offline Pretraining
Authors: Minting Pan, Yitao Zheng, Yunbo Wang, Xiaokang Yang,
Abstract summary: We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
Score: 59.82457030180094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretraining reinforcement learning (RL) models on offline datasets is a promising way to improve their training efficiency in online tasks, but challenging due to the inherent mismatch in dynamics and behaviors across various tasks. We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance for both dynamics representation transfer and policy transfer. We build a time-varying, domain-selective distillation loss to generate a set of offline-to-online similarity weights. These weights serve two purposes: (i) adaptively transferring the task-agnostic knowledge of physical dynamics to facilitate world model training, and (ii) learning to replay relevant source actions to guide the target policy. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.

Related papers

Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning [6.189693079685375]
offline model-based RL (MBRL) explicitly learns a world model from a static dataset.<n>We propose a framework that dynamically adapts the world model alongside the policy.<n>We benchmark our algorithm on twelve noisy D4RL MuJoCo tasks and three Tokamak Control tasks, demonstrating its state-of-the-art performance.
arXiv Detail & Related papers (2025-05-19T20:14:33Z)
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning [93.58897637077001]
This paper tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints. We pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos. For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model.
arXiv Detail & Related papers (2025-03-11T13:50:22Z)
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z)
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning. Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy. Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
Simplified Temporal Consistency Reinforcement Learning [19.814047499837084]
We show that a simple representation learning approach relying on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. Our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
arXiv Detail & Related papers (2023-06-15T19:37:43Z)
Predictive Experience Replay for Continual Visual Control and Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
A Unified Framework for Alternating Offline Model Training and Policy Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning. We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z)
Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information [5.604859261995801]
We propose a unified offline-to-online RL framework that resolves the transition performance drop issue. We introduce goal-aware state information to the RL agent, which can greatly reduce task complexity and accelerate policy learning. Our framework achieves great training efficiency and performance compared with the state-of-the-art methods in multiple robotic manipulation tasks.
arXiv Detail & Related papers (2021-10-21T05:34:25Z)
Learning Adaptable Policy via Meta-Adversarial Inverse Reinforcement Learning for Decision-making Tasks [2.1485350418225244]
We build an adaptable imitation learning model based on the integration of Meta-learning and Adversarial Inverse Reinforcement Learning. We exploit the adversarial learning and inverse reinforcement learning mechanisms to learn policies and reward functions simultaneously from available training tasks.
arXiv Detail & Related papers (2021-03-23T17:16:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.