Revisiting Memory Efficient Kernel Approximation: An Indefinite Learning
Perspective
- URL: http://arxiv.org/abs/2112.09893v1
- Date: Sat, 18 Dec 2021 10:01:34 GMT
- Title: Revisiting Memory Efficient Kernel Approximation: An Indefinite Learning
Perspective
- Authors: Simon Heilig, Maximilian M\"unch, Frank-Michael Schleif
- Abstract summary: Matrix approximations are a key element in large-scale machine learning approaches.
We extend MEKA to be applicable not only for shift-invariant kernels but also for non-stationary kernels.
We present a Lanczos-based estimation of a spectrum shift to develop a stable positive semi-definite MEKA approximation.
- Score: 0.8594140167290097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Matrix approximations are a key element in large-scale algebraic machine
learning approaches. The recently proposed method MEKA (Si et al., 2014)
effectively employs two common assumptions in Hilbert spaces: the low-rank
property of an inner product matrix obtained from a shift-invariant kernel
function and a data compactness hypothesis by means of an inherent
block-cluster structure. In this work, we extend MEKA to be applicable not only
for shift-invariant kernels but also for non-stationary kernels like polynomial
kernels and an extreme learning kernel. We also address in detail how to handle
non-positive semi-definite kernel functions within MEKA, either caused by the
approximation itself or by the intentional use of general kernel functions. We
present a Lanczos-based estimation of a spectrum shift to develop a stable
positive semi-definite MEKA approximation, also usable in classical convex
optimization frameworks. Furthermore, we support our findings with theoretical
considerations and a variety of experiments on synthetic and real-world data.
Related papers
- MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation [5.298814565953444]
Relative position encoding methods address the length extrapolation challenge exclusively through the implementation of a single kernel function.
This study proposes a novel relative positional encoding method, called MEP, which employs a weighted average to combine distinct kernel functions.
We present two distinct versions of our method: a parameter-free variant that requires no new learnable parameters, and a parameterized variant capable of integrating state-of-the-art techniques.
arXiv Detail & Related papers (2024-03-26T13:38:06Z) - On the Approximation of Kernel functions [0.0]
The paper addresses approximations of the kernel itself.
For the Hilbert Gauss kernel on the unit cube, the paper establishes an upper bound of the associated eigenfunctions.
This improvement confirms low rank approximation methods such as the Nystr"om method.
arXiv Detail & Related papers (2024-03-11T13:50:07Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Contrastive Learning Can Find An Optimal Basis For Approximately
View-Invariant Functions [18.440569330385323]
We show that multiple contrastive learning methods can be reinterpreted as learning kernel functions that approximate a fixed positive-pair kernel.
We prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors.
arXiv Detail & Related papers (2022-10-04T20:02:52Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Measuring dissimilarity with diffeomorphism invariance [94.02751799024684]
We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces.
We prove that DID enjoys properties which make it relevant for theoretical study and practical use.
arXiv Detail & Related papers (2022-02-11T13:51:30Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Generalized vec trick for fast learning of pairwise kernel models [3.867363075280544]
We present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects.
We show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation.
arXiv Detail & Related papers (2020-09-02T13:27:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.