Continuous-Time Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2102.04764v2
- Date: Wed, 10 Feb 2021 08:46:17 GMT
- Title: Continuous-Time Model-Based Reinforcement Learning
- Authors: \c{C}a\u{g}atay Y{\i}ld{\i}z, Markus Heinonen, and Harri
L\"ahdesm\"aki
- Abstract summary: We propose a continuous-time MBRL framework based on a novel actor-critic method.
We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems.
- Score: 4.427447378048202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (MBRL) approaches rely on discrete-time
state transition models whereas physical systems and the vast majority of
control tasks operate in continuous-time. To avoid time-discretization
approximation of the underlying process, we propose a continuous-time MBRL
framework based on a novel actor-critic method. Our approach also infers the
unknown state evolution differentials with Bayesian neural ordinary
differential equations (ODE) to account for epistemic uncertainty. We implement
and test our method on a new ODE-RL suite that explicitly solves
continuous-time control systems. Our experiments illustrate that the model is
robust against irregular and noisy data, is sample-efficient, and can solve
control problems which pose challenges to discrete-time MBRL methods.
Related papers
- One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion
Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference.
OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters.
Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z) - Diffusion-Generative Multi-Fidelity Learning for Physical Simulation [24.723536390322582]
We develop a diffusion-generative multi-fidelity learning method based on differential equations (SDE), where the generation is a continuous denoising process.
By conditioning on additional inputs (temporal or spacial variables), our model can efficiently learn and predict multi-dimensional solution arrays.
arXiv Detail & Related papers (2023-11-09T18:59:05Z) - Efficient Exploration in Continuous-time Model-based Reinforcement
Learning [37.14026153342745]
Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time.
We introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics.
arXiv Detail & Related papers (2023-10-30T15:04:40Z) - ODE-based Recurrent Model-free Reinforcement Learning for POMDPs [15.030970899252601]
We present a novel ODE-based recurrent model combines with model-free reinforcement learning framework to solve POMDPs.
We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks.
Our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
arXiv Detail & Related papers (2023-09-25T12:13:56Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - Learning Unstable Dynamics with One Minute of Data: A
Differentiation-based Gaussian Process Approach [47.045588297201434]
We show how to exploit the differentiability of Gaussian processes to create a state-dependent linearized approximation of the true continuous dynamics.
We validate our approach by iteratively learning the system dynamics of an unstable system such as a 9-D segway.
arXiv Detail & Related papers (2021-03-08T05:08:47Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Model-based Reinforcement Learning for Semi-Markov Decision Processes
with Neural ODEs [30.36381338938319]
We present two solutions for modeling continuous-time dynamics using neural ordinary differential equations (ODEs)
Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data.
We experimentally demonstrate the efficacy of our methods across various continuous-time domains.
arXiv Detail & Related papers (2020-06-29T17:21:43Z) - STEER: Simple Temporal Regularization For Neural ODEs [80.80350769936383]
We propose a new regularization technique: randomly sampling the end time of the ODE during training.
The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks.
We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.
arXiv Detail & Related papers (2020-06-18T17:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.