Related papers: Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning

Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning

URL: http://arxiv.org/abs/2410.11234v1
Date: Tue, 15 Oct 2024 03:36:43 GMT
Title: Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Authors: Jiayu Chen, Wentse Chen, Jeff Schneider,
Abstract summary: offline model-based reinforcement learning (MBRL) is a powerful approach for data-driven decision-making and control. There could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging. We introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces.
Score: 5.663006149337036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline reinforcement learning (RL) is a powerful approach for data-driven decision-making and control. Compared to model-free methods, offline model-based reinforcement learning (MBRL) explicitly learns world models from a static dataset and uses them as surrogate simulators, improving the data efficiency and enabling the learned policy to potentially generalize beyond the dataset support. However, there could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging. In this paper, we propose modeling offline MBRL as a Bayes Adaptive Markov Decision Process (BAMDP), which is a principled framework for addressing model uncertainty. We further introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces with stochastic transitions. This planning process is based on Monte Carlo Tree Search and can be integrated into offline MBRL as a policy improvement operator in policy iteration. Our ``RL + Search" framework follows in the footsteps of superhuman AIs like AlphaZero, improving on current offline MBRL methods by incorporating more computation input. The proposed algorithm significantly outperforms state-of-the-art model-based and model-free offline RL methods on twelve D4RL MuJoCo benchmark tasks and three target tracking tasks in a challenging, stochastic tokamak control simulator.

Related papers

Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning [6.189693079685375]
offline model-based RL (MBRL) explicitly learns a world model from a static dataset.<n>We propose a framework that dynamically adapts the world model alongside the policy.<n>We benchmark our algorithm on twelve noisy D4RL MuJoCo tasks and three Tokamak Control tasks, demonstrating its state-of-the-art performance.
arXiv Detail & Related papers (2025-05-19T20:14:33Z)
Enhancing Offline Model-Based RL via Active Model Selection: A Bayesian Optimization Perspective [11.20804263996665]
offline model-based reinforcement learning (MBRL) serves as a competitive framework that can learn well-performing policies solely from pre-collected data. We propose BOMS, an active model selection framework that enhances model selection in offline MBRL with only a small online interaction budget. We show that BOMS improves over the baseline methods with a small amount of online interaction comparable to only $1%$-$2.5%$ of offline training data.
arXiv Detail & Related papers (2025-02-17T06:34:58Z)
Offline Trajectory Optimization for Offline Reinforcement Learning [42.306438854850434]
offline reinforcement learning aims to learn policies without online explorations.<n>Existing data augmentation methods for offline RL suffer from (i) trivial improvement from short-horizon simulation.<n>We propose offline trajectory optimization for offline reinforcement learning (OTTO)
arXiv Detail & Related papers (2024-04-16T08:48:46Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z)
A Unified Framework for Alternating Offline Model Training and Policy Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning. We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z)
Online Policy Optimization for Robust MDP [17.995448897675068]
Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. In this work, we consider online robust Markov decision process (MDP) by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient.
arXiv Detail & Related papers (2022-09-28T05:18:20Z)
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE) MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary. In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z)
Variational Model-based Policy Optimization [34.80171122943031]
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL. We propose an objective function as a variational lower-bound of a log-likelihood of a log-likelihood to jointly learn and improve model and policy. Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called emactoral model-based policy optimization (VMBPO), is more sample-efficient and
arXiv Detail & Related papers (2020-06-09T18:30:15Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. We present MOReL, an algorithmic framework for model-based offline RL. We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.