Random feature approximation for general spectral methods
- URL: http://arxiv.org/abs/2308.15434v1
- Date: Tue, 29 Aug 2023 16:56:03 GMT
- Title: Random feature approximation for general spectral methods
- Authors: Mike Nguyen and Nicole M\"ucke
- Abstract summary: We analyze generalization properties for a large class of spectral regularization methods combined with random features.
For our estimators we obtain optimal learning rates over gradient regularity classes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random feature approximation is arguably one of the most popular techniques
to speed up kernel methods in large scale algorithms and provides a theoretical
approach to the analysis of deep neural networks. We analyze generalization
properties for a large class of spectral regularization methods combined with
random features, containing kernel methods with implicit regularization such as
gradient descent or explicit methods like Tikhonov regularization. For our
estimators we obtain optimal learning rates over regularity classes (even for
classes that are not included in the reproducing kernel Hilbert space), which
are defined through appropriate source conditions. This improves or completes
previous results obtained in related settings for specific kernel algorithms.
Related papers
- Variance-Reducing Couplings for Random Features: Perspectives from Optimal Transport [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning, replacing exact kernel evaluations with Monte Carlo estimates.
We tackle this through the unifying framework of optimal transport, using theoretical insights and numerical algorithms to develop novel, high-performing RF couplings for kernels defined on Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay [13.803850290216257]
We rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method.
Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks.
arXiv Detail & Related papers (2024-01-03T08:00:50Z) - The Galerkin method beats Graph-Based Approaches for Spectral Algorithms [3.5897534810405403]
We break with the machine learning community and prove the statistical and computational superiority of the Galerkin method.
We introduce implementation tricks to deal with differential operators in large dimensions with structured kernels.
We extend on the core principles beyond our approach to apply them to non-linear spaces of functions, such as the ones parameterized by deep neural networks.
arXiv Detail & Related papers (2023-06-01T14:38:54Z) - Coefficient-based Regularized Distribution Regression [4.21768682940933]
We consider the coefficient-based regularized distribution regression which aims to regress from probability measures to real-valued responses over a kernel reproducing Hilbert space (RKHS)
Asymptotic behaviors of the algorithm in different regularity ranges of the regression function are comprehensively studied.
We get the optimal rates under some mild conditions, which matches the one-stage sampled minimax optimal rate.
arXiv Detail & Related papers (2022-08-26T03:46:14Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - Spectrum Gaussian Processes Based On Tunable Basis Functions [15.088239458693003]
We introduce a novel basis function, which is tunable, local and bounded, to approximate the kernel function in the Gaussian process.
We conduct extensive experiments on open-source datasets to testify its performance.
arXiv Detail & Related papers (2021-07-14T03:51:24Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z) - Certified and fast computations with shallow covariance kernels [0.0]
We introduce and analyze a new and certified algorithm for the low-rank approximation of a parameterized family of covariance operators.
The proposed algorithm provides the basis of a fast sampling procedure for parameter dependent random fields.
arXiv Detail & Related papers (2020-01-24T20:28:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.