Simplified Temporal Consistency Reinforcement Learning
- URL: http://arxiv.org/abs/2306.09466v1
- Date: Thu, 15 Jun 2023 19:37:43 GMT
- Title: Simplified Temporal Consistency Reinforcement Learning
- Authors: Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen
- Abstract summary: We show that a simple representation learning approach relying on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL.
Our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
- Score: 19.814047499837084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning is able to solve complex sequential decision-making
tasks but is currently limited by sample efficiency and required computation.
To improve sample efficiency, recent work focuses on model-based RL which
interleaves model learning with planning. Recent methods further utilize policy
learning, value estimation, and, self-supervised learning as auxiliary
objectives. In this paper we show that, surprisingly, a simple representation
learning approach relying only on a latent dynamics model trained by latent
temporal consistency is sufficient for high-performance RL. This applies when
using pure planning with a dynamics model conditioned on the representation,
but, also when utilizing the representation as policy and value function
features in model-free RL. In experiments, our approach learns an accurate
dynamics model to solve challenging high-dimensional locomotion tasks with
online planners while being 4.1 times faster to train compared to
ensemble-based methods. With model-free RL without planning, especially on
high-dimensional tasks, such as the DeepMind Control Suite Humanoid and Dog
tasks, our approach outperforms model-free methods by a large margin and
matches model-based methods' sample efficiency while training 2.4 times faster.
Related papers
- Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control [1.5361702135159845]
This paper introduces a knowledge-informed model-based residual reinforcement learning framework.
It integrates traffic expert knowledge into a virtual environment model, employing the Intelligent Driver Model (IDM) for basic dynamics and neural networks for residual dynamics.
We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch.
arXiv Detail & Related papers (2024-08-30T16:16:57Z) - Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning [20.938465516348177]
Applying reinforcement learning to real-world applications requires addressing a trade-off between performance, sample efficiency, and inference time.
In this work, we demonstrate how to address this triple challenge by leveraging partial physical knowledge about the system dynamics.
arXiv Detail & Related papers (2024-07-02T12:32:57Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Temporal Difference Learning for Model Predictive Control [29.217382374051347]
Data-driven model predictive control has two key advantages over model-free methods.
TD-MPC achieves superior sample efficiency and performance over prior work on both state and image-based continuous control tasks.
arXiv Detail & Related papers (2022-03-09T18:58:28Z) - On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio.
In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.