Generalization Guarantees for Sparse Kernel Approximation with Entropic
Optimal Features
- URL: http://arxiv.org/abs/2002.04195v1
- Date: Tue, 11 Feb 2020 04:12:31 GMT
- Title: Generalization Guarantees for Sparse Kernel Approximation with Entropic
Optimal Features
- Authors: Liang Ding, Rui Tuo, Shahin Shahrampour
- Abstract summary: kernel methods suffer from a massive computational cost in practice.
We develop a novel optimal design maximizing the entropy among kernel features.
Our experiments on benchmark datasets verify the superiority of EOF over the state-of-the-art in kernel approximation.
- Score: 33.29293167413832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their success, kernel methods suffer from a massive computational
cost in practice. In this paper, in lieu of commonly used kernel expansion with
respect to $N$ inputs, we develop a novel optimal design maximizing the entropy
among kernel features. This procedure results in a kernel expansion with
respect to entropic optimal features (EOF), improving the data representation
dramatically due to features dissimilarity. Under mild technical assumptions,
our generalization bound shows that with only $O(N^{\frac{1}{4}})$ features
(disregarding logarithmic factors), we can achieve the optimal statistical
accuracy (i.e., $O(1/\sqrt{N})$). The salient feature of our design is its
sparsity that significantly reduces the time and space cost. Our numerical
experiments on benchmark datasets verify the superiority of EOF over the
state-of-the-art in kernel approximation.
Related papers
- Truncated Kernel Stochastic Gradient Descent on Spheres [1.4583059436979549]
Inspired by the structure of spherical harmonics, we propose the truncated kernel gradient descent (T- Kernel SGD) algorithm.
T- Kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in gradient descent.
In contrast to traditional kernel SGD, T- Kernel SGD is more effective in balancing bias and variance.
arXiv Detail & Related papers (2024-10-02T14:09:51Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - Conditional mean embeddings and optimal feature selection via positive
definite kernels [0.0]
We consider operator theoretic approaches to Conditional Conditional embeddings (CME)
Our results combine a spectral analysis-based optimization scheme with the use of kernels, processes, and constructive learning algorithms.
arXiv Detail & Related papers (2023-05-14T08:29:15Z) - Kernel Subspace and Feature Extraction [7.424262881242935]
We study kernel methods in machine learning from the perspective of feature subspace.
We construct a kernel from Hirschfeld--Gebelein--R'enyi maximal correlation functions, coined the maximal correlation kernel, and demonstrate its information-theoretic optimality.
arXiv Detail & Related papers (2023-01-04T02:46:11Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Exact Gaussian Processes for Massive Datasets via Non-Stationary
Sparsity-Discovering Kernels [0.0]
We propose to take advantage of naturally-structured sparsity by allowing the kernel to discover -- instead of induce -- sparse structure.
The principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.
arXiv Detail & Related papers (2022-05-18T16:56:53Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - A Robust Asymmetric Kernel Function for Bayesian Optimization, with
Application to Image Defect Detection in Manufacturing Systems [2.4278445972594525]
We propose a robust kernel function, Asymmetric Elastic Net Radial Basis Function (AEN-RBF)
We show theoretically that AEN-RBF can realize smaller mean squared prediction error under mild conditions.
We also show that the AEN-RBF kernel function is less sensitive to outliers.
arXiv Detail & Related papers (2021-09-22T17:59:05Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Taming Nonconvexity in Kernel Feature Selection---Favorable Properties
of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection.
The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z) - Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR)
This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling.
Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z) - The Statistical Cost of Robust Kernel Hyperparameter Tuning [20.42751031392928]
We study the statistical complexity of kernel hyperparameter tuning in the setting of active regression under adversarial noise.
We provide finite-sample guarantees for the problem, characterizing how increasing the complexity of the kernel class increases the complexity of learning kernel hyper parameters.
arXiv Detail & Related papers (2020-06-14T21:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.