Operator Models for Continuous-Time Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2511.10383v1
- Date: Fri, 14 Nov 2025 01:48:16 GMT
- Title: Operator Models for Continuous-Time Offline Reinforcement Learning
- Authors: Nicolas Hoischen, Petar Bevanda, Max Beier, Stefan Sosnowski, Boris Houska, Sandra Hirche,
- Abstract summary: Direct interaction with the environment is often unsafe or impractical, motivating offline reinforcement learning from historical data.<n>We address this by linking reinforcement learning to the Hamilton-Jacobi-Bellman equation and proposing an operator-theoretic algorithm.<n>Specifically, we represent our world model in terms of the infinitesimal generator of controlled diffusion processes learned in a reproducing kernel Hilbert space.
- Score: 4.808981008878068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often unsafe or impractical, motivating offline reinforcement learning from historical data. However, there is limited statistical understanding of the approximation errors inherent in learning policies from offline datasets. We address this by linking reinforcement learning to the Hamilton-Jacobi-Bellman equation and proposing an operator-theoretic algorithm based on a simple dynamic programming recursion. Specifically, we represent our world model in terms of the infinitesimal generator of controlled diffusion processes learned in a reproducing kernel Hilbert space. By integrating statistical learning methods and operator theory, we establish global convergence of the value function and derive finite-sample guarantees with bounds tied to system properties such as smoothness and stability. Our theoretical and numerical results indicate that operator-based approaches may hold promise in solving offline reinforcement learning using continuous-time optimal control.
Related papers
- Composable Model-Free RL for Navigation with Input-Affine Systems [3.2917282915992883]
As autonomous robots move into complex, dynamic real-world environments, they must learn to navigate safely in real time.<n>We propose a composable, model-free reinforcement learning method that learns a value function and an optimal policy for each individual environment element.
arXiv Detail & Related papers (2026-02-13T00:19:35Z) - Kernel-Based Optimal Control: An Infinitesimal Generator Approach [3.7411439050030393]
This paper presents a novel operator-theoretic approach for optimal control of nonlinear systems.<n>Our learning framework leverages data samples of system dynamics and stage cost functions.<n>We demonstrate that our approach seamlessly integrates with modern convex operator-theoretic Hamilton-Jacobi-Bellman recursions.
arXiv Detail & Related papers (2024-12-02T15:08:55Z) - Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion.
We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity.
Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z) - A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Overcoming Model Bias for Robust Offline Deep Reinforcement Learning [3.1325640909772403]
MOOSE is an algorithm which ensures low model bias by keeping the policy within the support of the data.
We compare MOOSE with state-of-the-art model-free, offline RL algorithms BRAC, BEAR and BCQ on the Industrial Benchmark and MuJoCo continuous control tasks in terms of robust performance.
arXiv Detail & Related papers (2020-08-12T19:08:55Z) - Online Reinforcement Learning Control by Direct Heuristic Dynamic
Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives.
It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise.
We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.