Related papers: Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features

Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features

URL: http://arxiv.org/abs/2002.04195v1
Date: Tue, 11 Feb 2020 04:12:31 GMT
Title: Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features
Authors: Liang Ding, Rui Tuo, Shahin Shahrampour
Abstract summary: kernel methods suffer from a massive computational cost in practice. We develop a novel optimal design maximizing the entropy among kernel features. Our experiments on benchmark datasets verify the superiority of EOF over the state-of-the-art in kernel approximation.
Score: 33.29293167413832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite their success, kernel methods suffer from a massive computational cost in practice. In this paper, in lieu of commonly used kernel expansion with respect to $N$ inputs, we develop a novel optimal design maximizing the entropy among kernel features. This procedure results in a kernel expansion with respect to entropic optimal features (EOF), improving the data representation dramatically due to features dissimilarity. Under mild technical assumptions, our generalization bound shows that with only $O(N^{\frac{1}{4}})$ features (disregarding logarithmic factors), we can achieve the optimal statistical accuracy (i.e., $O(1/\sqrt{N})$). The salient feature of our design is its sparsity that significantly reduces the time and space cost. Our numerical experiments on benchmark datasets verify the superiority of EOF over the state-of-the-art in kernel approximation.

Related papers

Truncated Kernel Stochastic Gradient Descent on Spheres [1.4583059436979549]
Inspired by the structure of spherical harmonics, we propose the truncated kernel gradient descent (T- Kernel SGD) algorithm. T- Kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in gradient descent. In contrast to traditional kernel SGD, T- Kernel SGD is more effective in balancing bias and variance.
arXiv Detail & Related papers (2024-10-02T14:09:51Z)
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z)
Conditional mean embeddings and optimal feature selection via positive definite kernels [0.0]
We consider operator theoretic approaches to Conditional Conditional embeddings (CME) Our results combine a spectral analysis-based optimization scheme with the use of kernels, processes, and constructive learning algorithms.
arXiv Detail & Related papers (2023-05-14T08:29:15Z)
Kernel Subspace and Feature Extraction [7.424262881242935]
We study kernel methods in machine learning from the perspective of feature subspace. We construct a kernel from Hirschfeld--Gebelein--R'enyi maximal correlation functions, coined the maximal correlation kernel, and demonstrate its information-theoretic optimality.
arXiv Detail & Related papers (2023-01-04T02:46:11Z)
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support. The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG) Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z)
Exact Gaussian Processes for Massive Datasets via Non-Stationary Sparsity-Discovering Kernels [0.0]
We propose to take advantage of naturally-structured sparsity by allowing the kernel to discover -- instead of induce -- sparse structure. The principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.
arXiv Detail & Related papers (2022-05-18T16:56:53Z)
Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems. In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z)
A Robust Asymmetric Kernel Function for Bayesian Optimization, with Application to Image Defect Detection in Manufacturing Systems [2.4278445972594525]
We propose a robust kernel function, Asymmetric Elastic Net Radial Basis Function (AEN-RBF) We show theoretically that AEN-RBF can realize smaller mean squared prediction error under mild conditions. We also show that the AEN-RBF kernel function is less sensitive to outliers.
arXiv Detail & Related papers (2021-09-22T17:59:05Z)
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z)
Taming Nonconvexity in Kernel Feature Selection---Favorable Properties of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection. The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z)
Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR) This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling. Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z)
The Statistical Cost of Robust Kernel Hyperparameter Tuning [20.42751031392928]
We study the statistical complexity of kernel hyperparameter tuning in the setting of active regression under adversarial noise. We provide finite-sample guarantees for the problem, characterizing how increasing the complexity of the kernel class increases the complexity of learning kernel hyper parameters.
arXiv Detail & Related papers (2020-06-14T21:56:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.