Optimal Learning Rates for Regularized Least-Squares with a Fourier
Capacity Condition
- URL: http://arxiv.org/abs/2204.07856v4
- Date: Fri, 15 Sep 2023 18:08:13 GMT
- Title: Optimal Learning Rates for Regularized Least-Squares with a Fourier
Capacity Condition
- Authors: Prem Talwai, David Simchi-Levi
- Abstract summary: We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales.
We demonstrate that the spectrum of the Mercer operator can be inferred in the presence of tight'' embeddings of suitable Hilbert scales.
- Score: 14.910167993978487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We derive minimax adaptive rates for a new, broad class of
Tikhonov-regularized learning problems in Hilbert scales under general source
conditions. Our analysis does not require the regression function to be
contained in the hypothesis class, and most notably does not employ the
conventional \textit{a priori} assumptions on kernel eigendecay. Using the
theory of interpolation, we demonstrate that the spectrum of the Mercer
operator can be inferred in the presence of ``tight'' $L^{\infty}(\mathcal{X})$
embeddings of suitable Hilbert scales. Our analysis utilizes a new Fourier
isocapacitary condition, which captures the interplay of the kernel Dirichlet
capacities and small ball probabilities via the optimal Hilbert scale function.
Related papers
- Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Statistical Inverse Problems in Hilbert Scales [0.0]
We study the Tikhonov regularization scheme in Hilbert scales for the nonlinear statistical inverse problem with a general noise.
The regularizing norm in this scheme is stronger than the norm in Hilbert space.
arXiv Detail & Related papers (2022-08-28T21:06:05Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - On the Universality of the Double Descent Peak in Ridgeless Regression [0.0]
We prove a non-asymptotic distribution-independent lower bound for the expected mean squared error caused by label noise in ridgeless linear regression.
Our lower bound generalizes a similar known result to the overized (interpolating) regime.
arXiv Detail & Related papers (2020-10-05T08:30:25Z) - Stochastic Gradient Descent in Hilbert Scales: Smoothness,
Preconditioning and Earlier Stopping [0.0]
We consider least squares learning in reproducing kernel Hilbert spaces (RKHSs) and extend the classical SGD analysis to a learning setting in Hilbert scales.
We show that even for well-specified models, violation of a traditional benchmark smoothness assumption has a tremendous effect on the learning rate.
In addition, we show that for miss-specified models, preconditioning in an appropriate Hilbert scale helps to reduce the number of iterations.
arXiv Detail & Related papers (2020-06-18T20:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.