Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning
- URL: http://arxiv.org/abs/2311.07558v2
- Date: Tue, 6 Feb 2024 20:41:01 GMT
- Title: Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning
- Authors: Arjun Bhardwaj, Jonas Rothfuss, Bhavya Sukhija, Yarden As, Marco
Hutter, Stelian Coros, Andreas Krause
- Abstract summary: PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
- Score: 58.575939354953526
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning
(Meta-RL) algorithm designed to efficiently adapt control policies to changing
dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift
adaptation to new dynamics with minimal interaction data. Existing Meta-RL
methods require abundant meta-learning data, limiting their applicability in
settings such as robotics, where data is costly to obtain. To address this,
PACOH-RL incorporates regularization and epistemic uncertainty quantification
in both the meta-learning and task adaptation stages. When facing new dynamics,
we use these uncertainty estimates to effectively guide exploration and data
collection. Overall, this enables positive transfer, even when access to data
from prior tasks or dynamic settings is severely limited. Our experiment
results demonstrate that PACOH-RL outperforms model-based RL and model-based
Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real
robotic car, we showcase the potential for efficient RL policy adaptation in
diverse, data-scarce conditions.
Related papers
- MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - On Task-Relevant Loss Functions in Meta-Reinforcement Learning and
Online LQR [9.355903533901023]
We propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner.
As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment.
arXiv Detail & Related papers (2023-12-09T04:52:28Z) - Task Aware Modulation using Representation Learning: An Approach for Few Shot Learning in Environmental Systems [15.40286222692196]
TAM-RL is a novel framework for few-shot learning in heterogeneous systems.
We evaluate TAM-RL on two real-world environmental datasets.
arXiv Detail & Related papers (2023-10-07T07:55:22Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Meta Reinforcement Learning for Adaptive Control: An Offline Approach [3.131740922192114]
We formulate a meta reinforcement learning (meta-RL) control strategy that takes advantage of known, offline information for training.
Our meta-RL agent has a recurrent structure that accumulates "context" for its current dynamics through a hidden state variable.
In tests reported here, the meta-RL agent was trained entirely offline, yet produced excellent results in novel settings.
arXiv Detail & Related papers (2022-03-17T23:58:52Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z) - Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces [14.029933823101084]
We propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE)
ELUE learns a belief model over the embedding space and a belief-conditional policy and Q-function.
We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks.
arXiv Detail & Related papers (2021-01-06T05:51:38Z) - Double Meta-Learning for Data Efficient Policy Optimization in
Non-Stationary Environments [12.45281856559346]
We are interested in learning models of non-stationary environments, which can be framed as a multi-task learning problem.
Model-free reinforcement learning algorithms can achieve good performance in multi-task learning at a cost of extensive sampling.
While model-based approaches are among the most data efficient learning algorithms, they still struggle with complex tasks and model uncertainties.
arXiv Detail & Related papers (2020-11-21T03:19:35Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.