Optimal policy evaluation using kernel-based temporal difference methods
- URL: http://arxiv.org/abs/2109.12002v1
- Date: Fri, 24 Sep 2021 14:48:20 GMT
- Title: Optimal policy evaluation using kernel-based temporal difference methods
- Authors: Yaqi Duan, Mengdi Wang, Martin J. Wainwright
- Abstract summary: We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
- Score: 78.83926562536791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study methods based on reproducing kernel Hilbert spaces for estimating
the value function of an infinite-horizon discounted Markov reward process
(MRP). We study a regularized form of the kernel least-squares temporal
difference (LSTD) estimate; in the population limit of infinite data, it
corresponds to the fixed point of a projected Bellman operator defined by the
associated reproducing kernel Hilbert space. The estimator itself is obtained
by computing the projected fixed point induced by a regularized version of the
empirical operator; due to the underlying kernel structure, this reduces to
solving a linear system involving kernel matrices. We analyze the error of this
estimate in the $L^2(\mu)$-norm, where $\mu$ denotes the stationary
distribution of the underlying Markov chain. Our analysis imposes no
assumptions on the transition operator of the Markov chain, but rather only
conditions on the reward function and population-level kernel LSTD solutions.
We use empirical process theory techniques to derive a non-asymptotic upper
bound on the error with explicit dependence on the eigenvalues of the
associated kernel operator, as well as the instance-dependent variance of the
Bellman residual error. In addition, we prove minimax lower bounds over
sub-classes of MRPs, which shows that our rate is optimal in terms of the
sample size $n$ and the effective horizon $H = (1 - \gamma)^{-1}$. Whereas
existing worst-case theory predicts cubic scaling ($H^3$) in the effective
horizon, our theory reveals that there is in fact a much wider range of
scalings, depending on the kernel, the stationary distribution, and the
variance of the Bellman residual error. Notably, it is only parametric and
near-parametric problems that can ever achieve the worst-case cubic scaling.
Related papers
- Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator [7.163829696591103]
We consider learning an unknown target function $f_*$ using kernel ridge regression.
We establish a non-asymptotic deterministic approximation for the test error of KRR.
arXiv Detail & Related papers (2024-03-13T20:12:03Z) - Optimal minimax rate of learning interaction kernels [7.329333373512536]
We introduce a tamed least squares estimator (tLSE) that attains the optimal convergence rate for a broad class of exchangeable distributions.
Our findings reveal that provided the inverse problem in the large sample limit satisfies a coercivity condition, the left tail probability does not alter the bias-variance tradeoff.
arXiv Detail & Related papers (2023-11-28T15:01:58Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Nonparametric approximation of conditional expectation operators [0.3655021726150368]
We investigate the approximation of the $L2$-operator defined by $[Pf](x) := mathbbE[ f(Y) mid X = x ]$ under minimal assumptions.
We prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on a reproducing kernel space.
arXiv Detail & Related papers (2020-12-23T19:06:12Z) - Early stopping and polynomial smoothing in regression with reproducing kernels [2.0411082897313984]
We study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS)
We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle.
The proposed rule is proved to be minimax-optimal over different types of kernel spaces.
arXiv Detail & Related papers (2020-07-14T05:27:18Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.