Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression
- URL: http://arxiv.org/abs/2211.10968v3
- Date: Sun, 18 Feb 2024 14:16:35 GMT
- Title: Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression
- Authors: Jiading Liu and Lei Shi
- Abstract summary: This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
- Score: 1.7227952883644062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous analysis of regularized functional linear regression in a
reproducing kernel Hilbert space (RKHS) typically requires the target function
to be contained in this kernel space. This paper studies the convergence
performance of divide-and-conquer estimators in the scenario that the target
function does not necessarily reside in the underlying RKHS. As a
decomposition-based scalable approach, the divide-and-conquer estimators of
functional linear regression can substantially reduce the algorithmic
complexities in time and memory. We develop an integral operator approach to
establish sharp finite sample upper bounds for prediction with
divide-and-conquer estimators under various regularity conditions of
explanatory variables and target function. We also prove the asymptotic
optimality of the derived rates by building the mini-max lower bounds. Finally,
we consider the convergence of noiseless estimators and show that the rates can
be arbitrarily fast under mild conditions.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Capacity dependent analysis for functional online learning algorithms [8.748563565641279]
This article provides convergence analysis of online gradient descent algorithms for functional linear models.
We show that capacity assumption can alleviate the saturation of the convergence rate as the regularity of the target function increases.
arXiv Detail & Related papers (2022-09-25T11:21:18Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Equivalence of Convergence Rates of Posterior Distributions and Bayes
Estimators for Functions and Nonparametric Functionals [4.375582647111708]
We study the posterior contraction rates of a Bayesian method with Gaussian process priors in nonparametric regression.
For a general class of kernels, we establish convergence rates of the posterior measure of the regression function and its derivatives.
Our proof shows that, under certain conditions, to any convergence rate of Bayes estimators there corresponds the same convergence rate of the posterior distributions.
arXiv Detail & Related papers (2020-11-27T19:11:56Z) - Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv Detail & Related papers (2020-11-13T21:54:28Z) - On the Estimation of Derivatives Using Plug-in Kernel Ridge Regression
Estimators [4.392844455327199]
We propose a simple plug-in kernel ridge regression (KRR) estimator in nonparametric regression.
We provide a non-asymotic analysis to study the behavior of the proposed estimator in a unified manner.
The proposed estimator achieves the optimal rate of convergence with the same choice of tuning parameter for any order of derivatives.
arXiv Detail & Related papers (2020-06-02T02:32:39Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.