Random feature approximation for general spectral methods
- URL: http://arxiv.org/abs/2506.16283v1
- Date: Thu, 19 Jun 2025 13:00:17 GMT
- Title: Random feature approximation for general spectral methods
- Authors: Mike Nguyen, Nicole Mücke,
- Abstract summary: This work extends previous results for Tikhonov regularization to a broad class of spectral regularization techniques.<n>We enable a theoretical analysis of neural networks and neural operators through the lens of the Neural Tangent Kernel (NTK) approach.
- Score: 2.9388890036358104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random feature approximation is arguably one of the most widely used techniques for kernel methods in large-scale learning algorithms. In this work, we analyze the generalization properties of random feature methods, extending previous results for Tikhonov regularization to a broad class of spectral regularization techniques. This includes not only explicit methods but also implicit schemes such as gradient descent and accelerated algorithms like the Heavy-Ball and Nesterov method. Through this framework, we enable a theoretical analysis of neural networks and neural operators through the lens of the Neural Tangent Kernel (NTK) approach trained via gradient descent. For our estimators we obtain optimal learning rates over regularity classes (even for classes that are not included in the reproducing kernel Hilbert space), which are defined through appropriate source conditions. This improves or completes previous results obtained in related settings for specific kernel algorithms.
Related papers
- Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel [55.82768375605861]
We establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity for kernel methods.<n>Unlike static kernels such as NTK, the LPK captures the entire training trajectory, adapting to both data and optimization dynamics.
arXiv Detail & Related papers (2025-06-12T23:17:09Z) - Kernel Descent -- a Novel Optimizer for Variational Quantum Algorithms [0.0]
We introduce kernel descent, a novel algorithm for minimizing the functions underlying variational quantum algorithms.
In particular, we showcase scenarios in which kernel descent outperforms gradient descent and quantum analytic descent.
Kernel descent sets itself apart with its employment of reproducing kernel Hilbert space techniques in the construction of the local approximations.
arXiv Detail & Related papers (2024-09-16T13:10:26Z) - Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay [13.803850290216257]
We rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method.
Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks.
arXiv Detail & Related papers (2024-01-03T08:00:50Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Random feature approximation for general spectral methods [0.0]
We analyze generalization properties for a large class of spectral regularization methods combined with random features.
For our estimators we obtain optimal learning rates over gradient regularity classes.
arXiv Detail & Related papers (2023-08-29T16:56:03Z) - On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates.
Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z) - The Galerkin method beats Graph-Based Approaches for Spectral Algorithms [3.5897534810405403]
We break with the machine learning community and prove the statistical and computational superiority of the Galerkin method.
We introduce implementation tricks to deal with differential operators in large dimensions with structured kernels.
We extend on the core principles beyond our approach to apply them to non-linear spaces of functions, such as the ones parameterized by deep neural networks.
arXiv Detail & Related papers (2023-06-01T14:38:54Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Spectrum Gaussian Processes Based On Tunable Basis Functions [15.088239458693003]
We introduce a novel basis function, which is tunable, local and bounded, to approximate the kernel function in the Gaussian process.
We conduct extensive experiments on open-source datasets to testify its performance.
arXiv Detail & Related papers (2021-07-14T03:51:24Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.