Related papers: A Survey on Model-based Reinforcement Learning

A Survey on Model-based Reinforcement Learning

URL: http://arxiv.org/abs/2206.09328v1
Date: Sun, 19 Jun 2022 05:28:03 GMT
Title: A Survey on Model-based Reinforcement Learning
Authors: Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, Yang Yu
Abstract summary: Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment. Model-based reinforcement learning (MBRL) is believed to be a promising direction, which builds environment models in which the trial-and-errors can take place without real costs.
Score: 21.85904195671014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment. While RL achieves outstanding success in playing complex video games that allow huge trial-and-error, making errors is always undesired in the real world. To improve the sample efficiency and thus reduce the errors, model-based reinforcement learning (MBRL) is believed to be a promising direction, which builds environment models in which the trial-and-errors can take place without real costs. In this survey, we take a review of MBRL with a focus on the recent progress in deep RL. For non-tabular environments, there is always a generalization error between the learned environment model and the real environment. As such, it is of great importance to analyze the discrepancy between policy training in the environment model and that in the real environment, which in turn guides the algorithm design for better model learning, model usage, and policy training. Besides, we also discuss the recent advances of model-based techniques in other forms of RL, including offline RL, goal-conditioned RL, multi-agent RL, and meta-RL. Moreover, we discuss the applicability and advantages of MBRL in real-world tasks. Finally, we end this survey by discussing the promising prospects for the future development of MBRL. We think that MBRL has great potential and advantages in real-world applications that were overlooked, and we hope this survey could attract more research on MBRL.

Related papers

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs [14.29992535286614]
We show that rule-based reinforcement learning can unlock Theory of Mind (ToM) reasoning capabilities even in small-scale language models. Our RL-trained 7B model achieves 84.50% accuracy on the Hi-ToM benchmark, surpassing models like GPT-4o and DeepSeek-v3. These findings highlight RL's potential to enhance social cognitive reasoning, bridging the gap between structured problem-solving and nuanced social inference.
arXiv Detail & Related papers (2025-04-02T12:58:42Z)
The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning [8.36595587335589]
Visual Reinforcement Learning methods often require extensive amounts of data. Model-based RL (MBRL) offers a potential solution with efficient data utilization through planning. MBRL lacks generalization capabilities for real-world tasks.
arXiv Detail & Related papers (2024-11-15T13:21:26Z)
A Benchmark Environment for Offline Reinforcement Learning in Racing Games [54.83171948184851]
Offline Reinforcement Learning (ORL) is a promising approach to reduce the high sample complexity of traditional Reinforcement Learning (RL) This paper introduces OfflineMania, a novel environment for ORL research. It is inspired by the iconic TrackMania series and developed using the Unity 3D game engine.
arXiv Detail & Related papers (2024-07-12T16:44:03Z)
A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning [10.154341066746975]
Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient, adaptive, and explainable. How to best learn the model is still an unresolved question.
arXiv Detail & Related papers (2023-10-10T01:58:38Z)
A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z)
A Validation Tool for Designing Reinforcement Learning Environments [0.0]
This study proposes a Markov-based feature analysis method to validate whether an MDP is well formulated. We believe an MDP suitable for applying RL should contain a set of state features that are both sensitive to actions and predictive in rewards.
arXiv Detail & Related papers (2021-12-10T13:28:08Z)
Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. We present MOReL, an algorithmic framework for model-based offline RL. We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.