Temporal Difference Learning with Continuous Time and State in the
Stochastic Setting
- URL: http://arxiv.org/abs/2202.07960v3
- Date: Wed, 7 Jun 2023 12:18:33 GMT
- Title: Temporal Difference Learning with Continuous Time and State in the
Stochastic Setting
- Authors: Ziad Kobeissi (SIERRA), Francis Bach (SIERRA, DI-ENS, PSL)
- Abstract summary: We consider the problem of continuous-time policy evaluation.
This consists in learning through observations the value function associated with an uncontrolled continuous-time dynamic and a reward function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of continuous-time policy evaluation. This consists
in learning through observations the value function associated with an
uncontrolled continuous-time stochastic dynamic and a reward function. We
propose two original variants of the well-known TD(0) method using vanishing
time steps. One is model-free and the other is model-based. For both methods,
we prove theoretical convergence rates that we subsequently verify through
numerical simulations. Alternatively, those methods can be interpreted as novel
reinforcement learning approaches for approximating solutions of linear PDEs
(partial differential equations) or linear BSDEs (backward stochastic
differential equations).
Related papers
- A Geometric Perspective on Diffusion Models [60.69328526215776]
We inspect the ODE-based sampling of a popular variance-exploding SDE and reveal several intriguing structures of its sampling dynamics.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Semi-supervised Learning of Partial Differential Operators and Dynamical
Flows [68.77595310155365]
We present a novel method that combines a hyper-network solver with a Fourier Neural Operator architecture.
We test our method on various time evolution PDEs, including nonlinear fluid flows in one, two, and three spatial dimensions.
The results show that the new method improves the learning accuracy at the time point of supervision point, and is able to interpolate and the solutions to any intermediate time.
arXiv Detail & Related papers (2022-07-28T19:59:14Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by
Normalizing Flows [29.310742141970394]
We introduce ImitationFlow, a novel Deep generative model that allows learning complex globally stable, nonlinear dynamics.
We show the effectiveness of our method with both standard datasets and a real robot experiment.
arXiv Detail & Related papers (2020-10-25T14:49:46Z) - Identifying Latent Stochastic Differential Equations [29.103393300261587]
We present a method for learning latent differential equations (SDEs) from high-dimensional time series data.
The proposed method learns the mapping from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach.
We validate the method through several simulated video processing tasks, where the underlying SDE is known, and through real world datasets.
arXiv Detail & Related papers (2020-07-12T19:46:31Z) - Stochastic Differential Equations with Variational Wishart Diffusions [18.590352916158093]
We present a non-parametric way of inferring differential equations for both regression tasks and continuous-time dynamical modelling.
The work has high emphasis on the part of the differential equation, also known as the diffusion, and modelling it by means of Wishart processes.
arXiv Detail & Related papers (2020-06-26T10:21:35Z) - Learning continuous-time PDEs from sparse data with graph neural
networks [10.259254824702555]
We propose a continuous-time differential model for dynamical systems whose governing equations are parameterized by message passing graph neural networks.
We demonstrate the model's ability to work with unstructured grids, arbitrary time steps, and noisy observations.
We compare our method with existing approaches on several well-known physical systems that involve first and higher-order PDEs with state-of-the-art predictive performance.
arXiv Detail & Related papers (2020-06-16T07:15:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.