Near-continuous time Reinforcement Learning for continuous state-action
spaces
- URL: http://arxiv.org/abs/2309.02815v1
- Date: Wed, 6 Sep 2023 08:01:17 GMT
- Title: Near-continuous time Reinforcement Learning for continuous state-action
spaces
- Authors: Lorenzo Croissant (CEREMADE), Marc Abeille, Bruno Bouchard (CEREMADE)
- Abstract summary: We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory.
Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces.
We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively.
- Score: 3.5527561584422456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the Reinforcement Learning problem of controlling an unknown
dynamical system to maximise the long-term average reward along a single
trajectory. Most of the literature considers system interactions that occur in
discrete time and discrete state-action spaces. Although this standpoint is
suitable for games, it is often inadequate for mechanical or digital systems in
which interactions occur at a high frequency, if not in continuous time, and
whose state spaces are large if not inherently continuous. Perhaps the only
exception is the Linear Quadratic framework for which results exist both in
discrete and continuous time. However, its ability to handle continuous states
comes with the drawback of a rigid dynamic and reward structure. This work aims
to overcome these shortcomings by modelling interaction times with a Poisson
clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales:
from discrete ($\varepsilon=1$) to continuous time ($\varepsilon\downarrow0$).
In addition, we consider a generic reward function and model the state dynamics
according to a jump process with an arbitrary transition kernel on
$\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the
sub-tasks (learning and planning) can be performed effectively. We tackle
learning within the eluder dimension framework and propose an approximate
planning method based on a diffusive limit approximation of the jump process.
Overall, our algorithm enjoys a regret of order
$\tilde{\mathcal{O}}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of
interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes,
showing that $\tilde{\mathcal{O}}(\sqrt{T})$ is attainable in near-continuous
time.
Related papers
- Dynamically emergent correlations in bosons via quantum resetting [0.0]
We study the nonequilibrium stationary state (NESS) induced by quantum resetting of a system of $N$ noninteracting bosons in a harmonic trap.
We fully characterize the steady state by analytically computing several physical observables such as the average density, extreme value statistics, order and gap statistics.
This is a rare example of a strongly correlated quantum many-body NESS where various observables can be exactly computed.
arXiv Detail & Related papers (2024-07-29T18:00:35Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Integrable Digital Quantum Simulation: Generalized Gibbs Ensembles and
Trotter Transitions [0.0]
We study a quench from a spin-wave state in the XXZ Heisenberg spin chain.
By means of exact calculations we find that the Generalized Gibbs Ensemble depends analytically on the Trotter step.
We show that the latter can be detected locally, as it is associated with the appearance of a non-zero staggered magnetization.
arXiv Detail & Related papers (2022-12-13T09:54:56Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Random quantum circuits transform local noise into global white noise [118.18170052022323]
We study the distribution over measurement outcomes of noisy random quantum circuits in the low-fidelity regime.
For local noise that is sufficiently weak and unital, correlations (measured by the linear cross-entropy benchmark) between the output distribution $p_textnoisy$ of a generic noisy circuit instance shrink exponentially.
If the noise is incoherent, the output distribution approaches the uniform distribution $p_textunif$ at precisely the same rate.
arXiv Detail & Related papers (2021-11-29T19:26:28Z) - The connection between time-local and time-nonlocal perturbation
expansions [0.0]
We show that a series for the kernel $mathcalK$ to be translated directly into a corresponding series for the more complicated generator $mathcalG$.
We illustrate this for leading and next-to-leading order calculations of $mathcalK$ and $mathcalG$ for the single impurity Anderson model.
arXiv Detail & Related papers (2021-07-19T15:05:29Z) - Route to Extend the Lifetime of a Discrete Time Crystal in a Finite Spin
Chain Without Disorder [0.0]
Periodically driven systems are described by time dependent Hamiltonians that possess discrete time translation symmetry.
The spontaneous breaking of this symmetry leads to the emergence of a novel non-equilibrium phase of matter - the Discrete Time Crystal (DTC)
arXiv Detail & Related papers (2021-04-12T04:45:09Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Interpolated Collision Model Formalism [0.0]
I will discuss a novel method for constructing a continuous-time master equation from the discrete-time dynamics given by any Collision Model.
I will show that any continuum-limit-based approach will always yield unitary dynamics unless it is fine-tuned in some way.
arXiv Detail & Related papers (2020-09-22T11:50:14Z) - Frequentist Regret Bounds for Randomized Least-Squares Value Iteration [94.47472987987805]
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL)
In this paper, we introduce an optimistically-perturbed variant of the popular randomized least-squares value (RLSVI)
Under the assumption that the Markov decision process has low-rank transition dynamics, we prove that the frequentist regret of RLSVI is upper-bounded by $widetilde O(d2 H2 sqrtT)$ where $ d $ are the feature dimension, $ H $ is the horizon, and $ T $ is the total number of
arXiv Detail & Related papers (2019-11-01T19:48:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.