Learning primal-dual sparse kernel machines
- URL: http://arxiv.org/abs/2108.12199v1
- Date: Fri, 27 Aug 2021 09:38:53 GMT
- Title: Learning primal-dual sparse kernel machines
- Authors: Riikka Huusari, Sahely Bhadra, C\'ecile Capponi, Hachem Kadri, Juho
Rousu
- Abstract summary: Traditionally, kernel methods rely on the representer theorem which states that the solution to a learning problem is obtained as a linear combination of the data mapped into the reproducing kernel Hilbert space (RKHS)
We propose to search for a solution in RKHS that has a pre-image decomposition in the original data space, where the elements don't necessarily correspond to the elements in the training set.
Our gradient-based method then hinges on optimisation over possibly sparse elements in the input space, and enables us to obtain a kernel-based model with both primal and dual sparsity.
- Score: 10.230121160034674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditionally, kernel methods rely on the representer theorem which states
that the solution to a learning problem is obtained as a linear combination of
the data mapped into the reproducing kernel Hilbert space (RKHS). While elegant
from theoretical point of view, the theorem is prohibitive for algorithms'
scalability to large datasets, and the interpretability of the learned
function. In this paper, instead of using the traditional representer theorem,
we propose to search for a solution in RKHS that has a pre-image decomposition
in the original data space, where the elements don't necessarily correspond to
the elements in the training set. Our gradient-based optimisation method then
hinges on optimising over possibly sparse elements in the input space, and
enables us to obtain a kernel-based model with both primal and dual sparsity.
We give theoretical justification on the proposed method's generalization
ability via a Rademacher bound. Our experiments demonstrate a better
scalability and interpretability with accuracy on par with the traditional
kernel-based models.
Related papers
- Scalable Kernel Inverse Optimization [2.799896314754615]
Inverse optimization is a framework for learning the unknown objective function of an expert decision-maker from a past dataset.
We extend the hypothesis class of IO objective functions to a reproducing a kernel Hilbert space.
We show that a variant of the representer theorem holds for a specific training loss, allowing the reformulation of the problem as a finite-dimensional convex optimization program.
arXiv Detail & Related papers (2024-10-31T14:06:43Z) - Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning [33.34053480377887]
This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels.
For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs)
arXiv Detail & Related papers (2024-06-03T15:28:12Z) - An Exact Kernel Equivalence for Finite Classification Models [1.4777718769290527]
We compare our exact representation to the well-known Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK.
We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks.
arXiv Detail & Related papers (2023-08-01T20:22:53Z) - Learning nonparametric ordinary differential equations from noisy data [0.10555513406636088]
Learning nonparametric systems of Ordinary Differential Equations (ODEs) dot x = f(t,x) from noisy data is an emerging machine learning topic.
We use the theory of Reproducing Kernel Hilbert Spaces (RKHS) to define candidates for f for which the solution of the ODE exists and is unique.
We propose a penalty method that iteratively uses the Representer theorem and Euler approximations to provide a numerical solution.
arXiv Detail & Related papers (2022-06-30T11:59:40Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Neural Fields as Learnable Kernels for 3D Reconstruction [101.54431372685018]
We present a novel method for reconstructing implicit 3D shapes based on a learned kernel ridge regression.
Our technique achieves state-of-the-art results when reconstructing 3D objects and large scenes from sparse oriented points.
arXiv Detail & Related papers (2021-11-26T18:59:04Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.