Impact of Computation in Integral Reinforcement Learning for
Continuous-Time Control
- URL: http://arxiv.org/abs/2402.17375v1
- Date: Tue, 27 Feb 2024 10:12:47 GMT
- Title: Impact of Computation in Integral Reinforcement Learning for
Continuous-Time Control
- Authors: Wenhan Cao, Wei Pan
- Abstract summary: We show that the choice of the computational method -- in this case, the quadrature rule -- can significantly impact control performance.
We draw a parallel between IntRL's policy iteration and Newton's method applied to the Hamilton-Jacobi-Bellman equation.
We prove that the local convergence rates for IntRL using the trapezoidal rule and Bayesian quadrature with a Mat'ern kernel to be $O(N-2)$ and $O(N-b)$.
- Score: 5.126167270246931
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Integral reinforcement learning (IntRL) demands the precise computation of
the utility function's integral at its policy evaluation (PEV) stage. This is
achieved through quadrature rules, which are weighted sums of utility functions
evaluated from state samples obtained in discrete time. Our research reveals a
critical yet underexplored phenomenon: the choice of the computational method
-- in this case, the quadrature rule -- can significantly impact control
performance. This impact is traced back to the fact that computational errors
introduced in the PEV stage can affect the policy iteration's convergence
behavior, which in turn affects the learned controller. To elucidate how
computation impacts control, we draw a parallel between IntRL's policy
iteration and Newton's method applied to the Hamilton-Jacobi-Bellman equation.
In this light, computational error in PEV manifests as an extra error term in
each iteration of Newton's method, with its upper bound proportional to the
computational error. Further, we demonstrate that when the utility function
resides in a reproducing kernel Hilbert space (RKHS), the optimal quadrature is
achievable by employing Bayesian quadrature with the RKHS-inducing kernel
function. We prove that the local convergence rates for IntRL using the
trapezoidal rule and Bayesian quadrature with a Mat\'ern kernel to be
$O(N^{-2})$ and $O(N^{-b})$, where $N$ is the number of evenly-spaced samples
and $b$ is the Mat\'ern kernel's smoothness parameter. These theoretical
findings are finally validated by two canonical control tasks.
Related papers
- Local Prediction-Powered Inference [7.174572371800217]
This paper introduces a specific algorithm for local multivariable regression using PPI.
The confidence intervals, bias correction, and coverage probabilities are analyzed and proved the correctness and superiority of our algorithm.
arXiv Detail & Related papers (2024-09-26T22:15:53Z) - Linear quadratic control of nonlinear systems with Koopman operator learning and the Nyström method [16.0198373552099]
We show how random subspaces can be used to achieve huge computational savings.
Our main technical contribution is deriving theoretical guarantees on the effect of the Nystr"om approximation.
arXiv Detail & Related papers (2024-03-05T09:28:40Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Smoothing Policy Iteration for Zero-sum Markov Games [9.158672246275348]
We propose the smoothing policy robustness (SPI) algorithm to solve the zero-sum MGs approximately.
Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.
We also propose a model-based algorithm called Smooth adversarial Actor-critic (SaAC) by extending SPI with the function approximations.
arXiv Detail & Related papers (2022-12-03T14:39:06Z) - Optimal scheduling of entropy regulariser for continuous-time
linear-quadratic reinforcement learning [9.779769486156631]
Herein agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy.
This exploration-exploitation trade-off is determined by the strength of entropy regularisation.
We prove that the regret, for both learning algorithms, is of the order $mathcalO(sqrtN) $ (up to a logarithmic factor) over $N$ episodes, matching the best known result from the literature.
arXiv Detail & Related papers (2022-08-08T23:36:40Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.