On Rollouts in Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2501.16918v2
- Date: Tue, 08 Apr 2025 08:24:38 GMT
- Title: On Rollouts in Model-Based Reinforcement Learning
- Authors: Bernd Frauenknecht, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe,
- Abstract summary: Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it.<n> accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning.<n>We propose Infoprop, a model-based rollout mechanism that separates aleatoric from model uncertainty and reduces the influence of the latter on the data distribution.
- Score: 5.004576576202551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality.
Related papers
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models.
We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z) - Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation [36.9134885948595]
We introduce Model-based Offline Reinforcement learning with AdversariaL data augmentation.
In MORAL, we replace the fixed horizon rollout by employing adversaria data augmentation to execute alternating sampling with ensemble models.
Experiments on D4RL benchmark demonstrate that MORAL outperforms other model-based offline RL methods in terms of policy learning and sample efficiency.
arXiv Detail & Related papers (2025-03-26T07:24:34Z) - Generative Modeling and Data Augmentation for Power System Production Simulation [0.0]
This paper proposes a generative model-assisted approach for load forecasting under small sample scenarios.<n>The expanded dataset significantly reduces forecasting errors compared to the original dataset.<n>The diffusion model outperforms the generative adversarial model by achieving about 200 times smaller errors.
arXiv Detail & Related papers (2024-12-10T12:38:47Z) - Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling.<n>Yet their widespread adoption poses challenges regarding data attribution and interpretability.<n>We develop an influence functions framework to address these challenges.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption [4.664767161598515]
Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with predictive transition models through model-based rollouts.
We propose an easy-to-tune rollout mechanism and substantial improvements in data efficiency and performance.
arXiv Detail & Related papers (2024-05-29T11:53:07Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE)
We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z) - Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.
We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation.
Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z) - Model Embedding Model-Based Reinforcement Learning [4.566180616886624]
Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL)
Despite the impressive results it achieves, it still faces a trade-off between the ease of data generation and model bias.
We propose a simple and elegant model-embedding model-based reinforcement learning (MEMB) algorithm in the framework of the probabilistic reinforcement learning.
arXiv Detail & Related papers (2020-06-16T15:10:28Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - Differentially Private ERM Based on Data Perturbation [41.37436071802578]
We measure the contributions of various training data instances on the final machine learning model.
Considering that the key of our method is to measure each data instance separately, we propose a new Data perturbation' based (DB) paradigm for DP-ERM.
arXiv Detail & Related papers (2020-02-20T06:05:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.