Related papers: Sample-efficient and Scalable Exploration in Continuous-Time RL

Sample-efficient and Scalable Exploration in Continuous-Time RL

URL: http://arxiv.org/abs/2510.24482v1
Date: Tue, 28 Oct 2025 14:54:12 GMT
Title: Sample-efficient and Scalable Exploration in Continuous-Time RL
Authors: Klemens Iten, Lenart Treven, Bhavya Sukhija, Florian Dörfler, Andreas Krause,
Abstract summary: We study the problem of continuous-time reinforcement learning, where the unknown system dynamics are represented using nonlinear ordinary differential equations.<n>We leverage probabilistic models, such as Gaussian processes and Bayesian neural networks, to learn an uncertainty-aware model of the underlying ODE.<n>This yields a scalable and sample-efficient approach to continuous-time model-based RL.
Score: 39.99126118024949
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we study the problem of continuous-time reinforcement learning, where the unknown system dynamics are represented using nonlinear ordinary differential equations (ODEs). We leverage probabilistic models, such as Gaussian processes and Bayesian neural networks, to learn an uncertainty-aware model of the underlying ODE. Our algorithm, COMBRL, greedily maximizes a weighted sum of the extrinsic reward and model epistemic uncertainty. This yields a scalable and sample-efficient approach to continuous-time model-based RL. We show that COMBRL achieves sublinear regret in the reward-driven setting, and in the unsupervised RL setting (i.e., without extrinsic rewards), we provide a sample complexity bound. In our experiments, we evaluate COMBRL in both standard and unsupervised RL settings and demonstrate that it scales better, is more sample-efficient than prior methods, and outperforms baselines across several deep RL tasks.

Related papers

Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z)
Statistical and Algorithmic Foundations of Reinforcement Learning [45.707617428078585]
sequential learning (RL) has received a flurry of attention in recent years.<n>We aim to introduce several important developments in RL, highlighting the connections between new ideas classical topics.
arXiv Detail & Related papers (2025-07-19T02:42:41Z)
Normalizing Flows are Capable Models for RL [24.876149287707847]
We propose a single Normalizing Flow architecture which integrates seamlessly into reinforcement learning algorithms.<n>Our approach leads to much simpler algorithms, and achieves higher performance in imitation learning, offline, goal conditioned RL and unsupervised RL.
arXiv Detail & Related papers (2025-05-29T15:06:22Z)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models.<n>We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
Towards General-Purpose Model-Free Reinforcement Learning [40.973429772093155]
Reinforcement learning (RL) promises a framework for near-universal problem-solving.<n>In practice, RL algorithms are often tailored to specific benchmarks.<n>We propose a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings.
arXiv Detail & Related papers (2025-01-27T15:36:37Z)
Efficient Exploration in Continuous-time Model-based Reinforcement Learning [37.14026153342745]
Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. We introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics.
arXiv Detail & Related papers (2023-10-30T15:04:40Z)
Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models. We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
Continuous-Time Model-Based Reinforcement Learning [4.427447378048202]
We propose a continuous-time MBRL framework based on a novel actor-critic method. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems.
arXiv Detail & Related papers (2021-02-09T11:30:19Z)
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms. Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.