Related papers: Model-based Reinforcement Learning: A Survey

Model-based Reinforcement Learning: A Survey

URL: http://arxiv.org/abs/2006.16712v4
Date: Thu, 31 Mar 2022 07:59:04 GMT
Title: Model-based Reinforcement Learning: A Survey
Authors: Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker
Abstract summary: Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning.
Score: 2.564530030795554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.

Related papers

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [31.509112804985133]
Reinforcement learning (RL) learns policies through trial and error, and optimal control, which plans actions using a learned or known dynamics model. We systematically analyze the performance of different RL and control-based methods under datasets of varying quality. Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency.
arXiv Detail & Related papers (2025-02-20T18:39:41Z)
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community. There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z)
Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms [34.593772931446125]
monograph focuses on the exploration of various model-based and model-free approaches for Constrained within the context of average reward Markov Decision Processes (MDPs) The primal-dual policy gradient-based algorithm is explored as a solution for constrained MDPs.
arXiv Detail & Related papers (2024-06-17T12:46:02Z)
Guiding Language Model Reasoning with Planning Tokens [122.43639723387516]
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks. We propose a hierarchical generation scheme to encourage a more structural generation of chain-of-thought steps. Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme.
arXiv Detail & Related papers (2023-10-09T13:29:37Z)
Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z)
Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics [20.148408520475655]
We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans. In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
arXiv Detail & Related papers (2021-11-15T16:58:50Z)
Evaluating model-based planning and planner amortization for continuous control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning. We find that well-tuned model-free agents are strong baselines even for high DoF control problems. We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z)
Model Complexity of Deep Learning: A Survey [79.20117679251766]
We conduct a systematic overview of the latest studies on model complexity in deep learning. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity.
arXiv Detail & Related papers (2021-03-08T22:39:32Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
A Unifying Framework for Reinforcement Learning and Planning [2.564530030795554]
This paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP) At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions.
arXiv Detail & Related papers (2020-06-26T14:30:41Z)
PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes [31.83144400718369]
We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment. We study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning.
arXiv Detail & Related papers (2020-06-11T11:57:08Z)
Policy-Aware Model Learning for Policy Gradient Methods [29.129883702165774]
This paper considers the problem of learning a model in model-based reinforcement learning (MBRL) We propose that the model learning module should incorporate the way the planner is going to use the model. We call this approach Policy-Aware Model Learning (PAML)
arXiv Detail & Related papers (2020-02-28T19:18:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.