Nonparametric approximation of conditional expectation operators
- URL: http://arxiv.org/abs/2012.12917v3
- Date: Sat, 5 Aug 2023 19:53:20 GMT
- Title: Nonparametric approximation of conditional expectation operators
- Authors: Mattes Mollenhauer and P\'eter Koltai
- Abstract summary: We investigate the approximation of the $L2$-operator defined by $[Pf](x) := mathbbE[ f(Y) mid X = x ]$ under minimal assumptions.
We prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on a reproducing kernel space.
- Score: 0.3655021726150368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given the joint distribution of two random variables $X,Y$ on some second
countable locally compact Hausdorff space, we investigate the statistical
approximation of the $L^2$-operator defined by $[Pf](x) := \mathbb{E}[ f(Y)
\mid X = x ]$ under minimal assumptions. By modifying its domain, we prove that
$P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt
operators acting on a reproducing kernel Hilbert space. This fact allows to
estimate $P$ uniformly by finite-rank operators over a dense subspace even when
$P$ is not compact. In terms of modes of convergence, we thereby obtain the
superiority of kernel-based techniques over classically used parametric
projection approaches such as Galerkin methods. This also provides a novel
perspective on which limiting object the nonparametric estimate of $P$
converges to. As an application, we show that these results are particularly
important for a large family of spectral analysis techniques for Markov
transition operators. Our investigation also gives a new asymptotic perspective
on the so-called kernel conditional mean embedding, which is the theoretical
foundation of a wide variety of techniques in kernel-based nonparametric
inference.
Related papers
- Tensor network approximation of Koopman operators [0.0]
We propose a framework for approximating the evolution of observables of measure-preserving ergodic systems.
Our approach is based on a spectrally-convergent approximation of the skew-adjoint Koopman generator.
A key feature of this quantum-inspired approximation is that it captures information from a tensor product space of dimension $(2d+1)n$.
arXiv Detail & Related papers (2024-07-09T21:40:14Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Rates of Convergence in Certain Native Spaces of Approximations used in
Reinforcement Learning [0.0]
This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(Omega)$.
Explicit upper bounds on error in value function and controller approximations are derived in terms of power function $mathcalP_H,N$ for the space of finite dimensional approximants $H_N$ in the native space $H(Omega)$.
arXiv Detail & Related papers (2023-09-14T02:02:08Z) - Efficient displacement convex optimization with particle gradient
descent [57.88860627977882]
Particle gradient descent is widely used to optimize functions of probability measures.
This paper considers particle gradient descent with a finite number of particles and establishes its theoretical guarantees to optimize functions that are emphdisplacement convex in measures.
arXiv Detail & Related papers (2023-02-09T16:35:59Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Optimal Rates for Regularized Conditional Mean Embedding Learning [32.870965795423366]
We derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting.
Our analysis reveals that our rates match the optimal $O(log n / n)$ rates without assuming $mathcalH_Y$ to be finite dimensional.
arXiv Detail & Related papers (2022-08-02T19:47:43Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.