Local optimisation of Nystr\"om samples through stochastic gradient
descent
- URL: http://arxiv.org/abs/2203.13284v1
- Date: Thu, 24 Mar 2022 18:17:27 GMT
- Title: Local optimisation of Nystr\"om samples through stochastic gradient
descent
- Authors: Matthew Hutchings and Bertrand Gauthier
- Abstract summary: We consider an unweighted variation of the squared-kernel discrepancy criterion as a surrogate for the classical criteria used to assess the Nystr"om approximation accuracy.
We perform numerical experiments which demonstrate that the local minimisation of the radial SKD yields Nystr"om samples with improved Nystr"om approximation accuracy.
- Score: 32.53634754382956
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a relaxed version of the column-sampling problem for the Nystr\"om
approximation of kernel matrices, where approximations are defined from
multisets of landmark points in the ambient space; such multisets are referred
to as Nystr\"om samples. We consider an unweighted variation of the radial
squared-kernel discrepancy (SKD) criterion as a surrogate for the classical
criteria used to assess the Nystr\"om approximation accuracy; in this setting,
we discuss how Nystr\"om samples can be efficiently optimised through
stochastic gradient descent. We perform numerical experiments which demonstrate
that the local minimisation of the radial SKD yields Nystr\"om samples with
improved Nystr\"om approximation accuracy.
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Samplet basis pursuit: Multiresolution scattered data approximation with sparsity constraints [0.0]
We consider scattered data approximation in samplet coordinates with $ell_1$-regularization.
By using the Riesz isometry, we embed samplets into reproducing kernel Hilbert spaces.
We argue that the class of signals that are sparse with respect to the embedded samplet basis is considerably larger than the class of signals that are sparse with respect to the basis of kernel translates.
arXiv Detail & Related papers (2023-06-16T21:20:49Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Sampling-based Nystr\"om Approximation and Kernel Quadrature [8.658596218544774]
We analyze the Nystr"om approximation of a positive definite kernel associated with a probability measure.
We first prove an improved error bound for the conventional Nystr"om approximation with i.i.d. sampling and singular-value decomposition.
We then introduce a refined selection of subspaces in Nystr"om approximation with theoretical guarantees that is applicable to non-i.i.d. landmark points.
arXiv Detail & Related papers (2023-01-23T16:05:56Z) - Preferential Subsampling for Stochastic Gradient Langevin Dynamics [3.158346511479111]
gradient MCMC offers an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data.
The resulting gradient estimator may exhibit a high variance and impact sampler performance.
We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.
arXiv Detail & Related papers (2022-10-28T14:56:18Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Nystr\"om Kernel Mean Embeddings [92.10208929236826]
We propose an efficient approximation procedure based on the Nystr"om method.
It yields sufficient conditions on the subsample size to obtain the standard $n-1/2$ rate.
We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules.
arXiv Detail & Related papers (2022-01-31T08:26:06Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z) - Diversity sampling is an implicit regularization for kernel methods [13.136143245702915]
We show that Nystr"om kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset.
A greedy is also proposed to select diverse samples of significant size within large datasets when exact DPP sampling is not practically feasible.
arXiv Detail & Related papers (2020-02-20T08:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.