Rates of Convergence in Certain Native Spaces of Approximations used in
Reinforcement Learning
- URL: http://arxiv.org/abs/2309.07383v4
- Date: Fri, 17 Nov 2023 15:04:12 GMT
- Title: Rates of Convergence in Certain Native Spaces of Approximations used in
Reinforcement Learning
- Authors: Ali Bouland, Shengyuan Niu, Sai Tej Paruchuri, Andrew Kurdila, John
Burns, Eugenio Schuster
- Abstract summary: This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(Omega)$.
Explicit upper bounds on error in value function and controller approximations are derived in terms of power function $mathcalP_H,N$ for the space of finite dimensional approximants $H_N$ in the native space $H(Omega)$.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper studies convergence rates for some value function approximations
that arise in a collection of reproducing kernel Hilbert spaces (RKHS)
$H(\Omega)$. By casting an optimal control problem in a specific class of
native spaces, strong rates of convergence are derived for the operator
equation that enables offline approximations that appear in policy iteration.
Explicit upper bounds on error in value function and controller approximations
are derived in terms of power function $\mathcal{P}_{H,N}$ for the space of
finite dimensional approximants $H_N$ in the native space $H(\Omega)$. These
bounds are geometric in nature and refine some well-known, now classical
results concerning convergence of approximations of value functions.
Related papers
- Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications [2.0584253077707477]
We study the convergence properties of the Gradient Descent (SGD) method for finding a stationary point of an objective function $J(cdot)$.
Our results apply to a class of invex'' functions, which have the property that every stationary point is also a global minimizer.
arXiv Detail & Related papers (2023-12-05T15:22:39Z) - On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z) - Non-asymptotic convergence bounds for Sinkhorn iterates and their
gradients: a coupling approach [10.568851068989972]
We focus on a relaxation of the original OT problem, the entropic OT problem, which allows to implement efficient and practical algorithmic solutions.
This formulation, also known as the Schr"odinger Bridge problem, notably connects with Optimal Control (SOC) and can be solved with the popular Sinkhorn algorithm.
arXiv Detail & Related papers (2023-04-13T13:58:25Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Quantum and classical low-degree learning via a dimension-free Remez
inequality [52.12931955662553]
We show a new way to relate functions on the hypergrid to their harmonic extensions over the polytorus.
We show the supremum of a function $f$ over products of the cyclic group $exp(2pi i k/K)_k=1K$.
We extend to new spaces a recent line of work citeEI22, CHP, VZ22 that gave similarly efficient methods for learning low-degrees on hypercubes and observables on qubits.
arXiv Detail & Related papers (2023-01-04T04:15:40Z) - Optimal Rates for Regularized Conditional Mean Embedding Learning [32.870965795423366]
We derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting.
Our analysis reveals that our rates match the optimal $O(log n / n)$ rates without assuming $mathcalH_Y$ to be finite dimensional.
arXiv Detail & Related papers (2022-08-02T19:47:43Z) - Error Estimates for the Variational Training of Neural Networks with
Boundary Penalty [0.0]
We establish estimates on the error made by the Ritz method for quadratic energies on the space $H1(Omega)$.
Special attention is paid to the case of Dirichlet boundary values which are treated with the boundary penalty method.
arXiv Detail & Related papers (2021-03-01T13:55:59Z) - Nonparametric approximation of conditional expectation operators [0.3655021726150368]
We investigate the approximation of the $L2$-operator defined by $[Pf](x) := mathbbE[ f(Y) mid X = x ]$ under minimal assumptions.
We prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on a reproducing kernel space.
arXiv Detail & Related papers (2020-12-23T19:06:12Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.