Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow
- URL: http://arxiv.org/abs/2103.14407v1
- Date: Fri, 26 Mar 2021 11:32:27 GMT
- Title: Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow
- Authors: John McLeod, Hrvoje Stojic, Vincent Adam, Dongho Kim, Jordi Grau-Moya,
Peter Vrancx, Felix Leibfried
- Abstract summary: Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox.
Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
- Score: 14.422129911404472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the past decade, model-free reinforcement learning (RL) has provided
solutions to challenging domains such as robotics. Model-based RL shows the
prospect of being more sample-efficient than model-free methods in terms of
agent-environment interactions, because the model enables to extrapolate to
unseen situations. In the more recent past, model-based methods have shown
superior results compared to model-free methods in some challenging domains
with non-linear state transitions. At the same time, it has become apparent
that RL is not market-ready yet and that many real-world applications are going
to require model-based approaches, because model-free methods are too
sample-inefficient and show poor performance in early stages of training. The
latter is particularly important in industry, e.g. in production systems that
directly impact a company's revenue. This demonstrates the necessity for a
toolbox to push the boundaries for model-based RL. While there is a plethora of
toolboxes for model-free RL, model-based RL has received little attention in
terms of toolbox development. Bellman aims to fill this gap and introduces the
first thoroughly designed and tested model-based RL toolbox using
state-of-the-art software engineering practices. Our modular approach enables
to combine a wide range of environment models with generic model-based agent
classes that recover state-of-the-art algorithms. We also provide an experiment
harness to compare both model-free and model-based agents in a systematic
fashion w.r.t. user-defined evaluation metrics (e.g. cumulative reward). This
paves the way for new research directions, e.g. investigating uncertainty-aware
environment models that are not necessarily neural-network-based, or developing
algorithms to solve industrially-motivated benchmarks that share
characteristics with real-world problems.
Related papers
- Offline Model-Based Reinforcement Learning with Anti-Exploration [0.0]
We present Morse Model-based offline RL (MoMo), which extends the anti-exploration paradigm found in offline model-free RL.
MoMo performs offline reinforcement learning using an anti-exploration bonus to counteract value overestimation.
The latter outperforms prior model-based and model-free baselines on the majority of D4RL datasets tested.
arXiv Detail & Related papers (2024-08-20T10:29:21Z) - Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - Modeling Choice via Self-Attention [8.394221523847325]
We show that our attention-based choice model is a low-optimal generalization of the Halo Multinomial Logit (Halo-MNL) model.
We also establish the first realistic-scale benchmark for choice estimation on real data, conducting an evaluation of existing models.
arXiv Detail & Related papers (2023-11-11T11:13:07Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [32.752633250862694]
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data.
We introduce a new framework, Reward rAnked FineTuning, designed to align generative models effectively.
arXiv Detail & Related papers (2023-04-13T18:22:40Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - VAE-LIME: Deep Generative Model Based Approach for Local Data-Driven
Model Interpretability Applied to the Ironmaking Industry [70.10343492784465]
It is necessary to expose to the process engineer, not solely the model predictions, but also their interpretability.
Model-agnostic local interpretability solutions based on LIME have recently emerged to improve the original method.
We present in this paper a novel approach, VAE-LIME, for local interpretability of data-driven models forecasting the temperature of the hot metal produced by a blast furnace.
arXiv Detail & Related papers (2020-07-15T07:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.