Related papers: Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

URL: http://arxiv.org/abs/2103.14407v1
Date: Fri, 26 Mar 2021 11:32:27 GMT
Title: Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow
Authors: John McLeod, Hrvoje Stojic, Vincent Adam, Dongho Kim, Jordi Grau-Moya, Peter Vrancx, Felix Leibfried
Abstract summary: Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox. Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
Score: 14.422129911404472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the past decade, model-free reinforcement learning (RL) has provided solutions to challenging domains such as robotics. Model-based RL shows the prospect of being more sample-efficient than model-free methods in terms of agent-environment interactions, because the model enables to extrapolate to unseen situations. In the more recent past, model-based methods have shown superior results compared to model-free methods in some challenging domains with non-linear state transitions. At the same time, it has become apparent that RL is not market-ready yet and that many real-world applications are going to require model-based approaches, because model-free methods are too sample-inefficient and show poor performance in early stages of training. The latter is particularly important in industry, e.g. in production systems that directly impact a company's revenue. This demonstrates the necessity for a toolbox to push the boundaries for model-based RL. While there is a plethora of toolboxes for model-free RL, model-based RL has received little attention in terms of toolbox development. Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox using state-of-the-art software engineering practices. Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms. We also provide an experiment harness to compare both model-free and model-based agents in a systematic fashion w.r.t. user-defined evaluation metrics (e.g. cumulative reward). This paves the way for new research directions, e.g. investigating uncertainty-aware environment models that are not necessarily neural-network-based, or developing algorithms to solve industrially-motivated benchmarks that share characteristics with real-world problems.

Related papers

Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments [0.0]
We investigate whether reinforcement learning (RL)-based model management can manage deployment decisions more effectively. Our approach enables more adaptive production environments by continuously evaluating deployed models and rolling back underperforming ones in real-time. Our findings suggest that RL-based model management can improve automation, reduce reliance on manual interventions, and mitigate risks associated with post-deployment model failures.
arXiv Detail & Related papers (2025-03-28T16:42:21Z)
Offline Model-Based Reinforcement Learning with Anti-Exploration [0.0]
We present Morse Model-based offline RL (MoMo), which extends the anti-exploration paradigm found in offline model-free RL. MoMo performs offline reinforcement learning using an anti-exploration bonus to counteract value overestimation. The latter outperforms prior model-based and model-free baselines on the majority of D4RL datasets tested.
arXiv Detail & Related papers (2024-08-20T10:29:21Z)
Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts. This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system. We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z)
Modeling Choice via Self-Attention [8.394221523847325]
We show that our attention-based choice model is a low-optimal generalization of the Halo Multinomial Logit (Halo-MNL) model. We also establish the first realistic-scale benchmark for choice estimation on real data, conducting an evaluation of existing models.
arXiv Detail & Related papers (2023-11-11T11:13:07Z)
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments. We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities. Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z)
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [32.752633250862694]
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. We introduce a new framework, Reward rAnked FineTuning, designed to align generative models effectively.
arXiv Detail & Related papers (2023-04-13T18:22:40Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z)
Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation. Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z)
Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z)
VAE-LIME: Deep Generative Model Based Approach for Local Data-Driven Model Interpretability Applied to the Ironmaking Industry [70.10343492784465]
It is necessary to expose to the process engineer, not solely the model predictions, but also their interpretability. Model-agnostic local interpretability solutions based on LIME have recently emerged to improve the original method. We present in this paper a novel approach, VAE-LIME, for local interpretability of data-driven models forecasting the temperature of the hot metal produced by a blast furnace.
arXiv Detail & Related papers (2020-07-15T07:07:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.