SOMBRL: Scalable and Optimistic Model-Based RL
- URL: http://arxiv.org/abs/2511.20066v1
- Date: Tue, 25 Nov 2025 08:39:21 GMT
- Title: SOMBRL: Scalable and Optimistic Model-Based RL
- Authors: Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Florian Dörfler, Pieter Abbeel, Andreas Krause,
- Abstract summary: We propose an approach based on the principle of optimism in the face of uncertainty.<n>We show that SOMBRL offers a flexible and scalable solution for principled exploration.<n>We also evaluate SOMBRL on a dynamic RC car hardware and show SOMBRL outperforms the state-of-the-art.
- Score: 78.3360288726531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose Scalable and Optimistic MBRL (SOMBRL), an approach based on the principle of optimism in the face of uncertainty. SOMBRL learns an uncertainty-aware dynamics model and greedily maximizes a weighted sum of the extrinsic reward and the agent's epistemic uncertainty. SOMBRL is compatible with any policy optimizers or planners, and under common regularity assumptions on the system, we show that SOMBRL has sublinear regret for nonlinear dynamics in the (i) finite-horizon, (ii) discounted infinite-horizon, and (iii) non-episodic settings. Additionally, SOMBRL offers a flexible and scalable solution for principled exploration. We evaluate SOMBRL on state-based and visual-control environments, where it displays strong performance across all tasks and baselines. We also evaluate SOMBRL on a dynamic RC car hardware and show SOMBRL outperforms the state-of-the-art, illustrating the benefits of principled exploration for MBRL.
Related papers
- Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning [12.864604506942294]
We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration.<n>OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss.<n>We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM.
arXiv Detail & Related papers (2026-02-10T18:11:00Z) - Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions [4.605026772972944]
We introduce a novel deep BRL method, Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL)<n>On challenging MetaWorld ML10/45 benchmarks, GLiBRL improves the success rate of one of the state-of-the-art deep BRL methods, VariBAD, by up to 2.7x.
arXiv Detail & Related papers (2025-12-24T06:00:51Z) - Enhancing Offline Model-Based RL via Active Model Selection: A Bayesian Optimization Perspective [11.20804263996665]
offline model-based reinforcement learning (MBRL) serves as a competitive framework that can learn well-performing policies solely from pre-collected data.<n>We propose BOMS, an active model selection framework that enhances model selection in offline MBRL with only a small online interaction budget.<n>We show that BOMS improves over the baseline methods with a small amount of online interaction comparable to only $1%$-$2.5%$ of offline training data.
arXiv Detail & Related papers (2025-02-17T06:34:58Z) - Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning [5.663006149337036]
offline model-based RL (MBRL) explicitly learns world models from a static dataset and uses them as surrogate simulators.<n>There could be various MDPs that behave identically on the offline dataset and dealing with the uncertainty about the true MDP can be challenging.<n>We propose modeling offline MBRL as a Bayes Adaptive Markov Decision Process (BAMDP), which is a principled framework for addressing model uncertainty.
arXiv Detail & Related papers (2024-10-15T03:36:43Z) - Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm.<n>Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z) - NeoRL: Efficient Exploration for Nonepisodic RL [50.67294735645895]
We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems.<n>We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2024-06-03T10:14:32Z) - Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly.
We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction.
Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - A Survey on Model-based Reinforcement Learning [21.85904195671014]
Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment.
Model-based reinforcement learning (MBRL) is believed to be a promising direction, which builds environment models in which the trial-and-errors can take place without real costs.
arXiv Detail & Related papers (2022-06-19T05:28:03Z) - Non-Markovian Reinforcement Learning using Fractional Dynamics [3.000697999889031]
Reinforcement learning (RL) is a technique to learn the control policy for an agent that interacts with an environment.
In this paper, we propose a model-based RL technique for a system that has non-Markovian dynamics.
Such environments are common in many real-world applications such as in human physiology, biological systems, material science, and population dynamics.
arXiv Detail & Related papers (2021-07-29T07:35:13Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.