An Online Multiple Kernel Parallelizable Learning Scheme
- URL: http://arxiv.org/abs/2308.10101v2
- Date: Mon, 6 Nov 2023 15:50:42 GMT
- Title: An Online Multiple Kernel Parallelizable Learning Scheme
- Authors: Emilio Ruiz-Moreno and Baltasar Beferull-Lozano
- Abstract summary: We propose a learning scheme that scalably combines several single kernel-based online methods to reduce the kernel-selection bias.
The proposed learning scheme applies to any task formulated as a regularized empirical risk minimization convex problem.
- Score: 6.436174170552483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of reproducing kernel Hilbert space-based methods is known to
be sensitive to the choice of the reproducing kernel. Choosing an adequate
reproducing kernel can be challenging and computationally demanding, especially
in data-rich tasks without prior information about the solution domain. In this
paper, we propose a learning scheme that scalably combines several single
kernel-based online methods to reduce the kernel-selection bias. The proposed
learning scheme applies to any task formulated as a regularized empirical risk
minimization convex problem. More specifically, our learning scheme is based on
a multi-kernel learning formulation that can be applied to widen any
single-kernel solution space, thus increasing the possibility of finding
higher-performance solutions. In addition, it is parallelizable, allowing for
the distribution of the computational load across different computing units. We
show experimentally that the proposed learning scheme outperforms the combined
single-kernel online methods separately in terms of the cumulative regularized
least squares cost metric.
Related papers
- Learning to Embed Distributions via Maximum Kernel Entropy [0.0]
Emprimiical data can often be considered as samples from a set of probability distributions.
Kernel methods have emerged as a natural approach for learning to classify these distributions.
We propose a novel objective for the unsupervised learning of data-dependent distribution kernel.
arXiv Detail & Related papers (2024-08-01T13:34:19Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Self-supervised learning with rotation-invariant kernels [4.059849656394191]
We propose a general kernel framework to design a generic regularization loss that promotes the embedding distribution to be close to the uniform distribution on the hypersphere.
Our framework uses rotation-invariant kernels defined on the hypersphere, also known as dot-product kernels.
Our experiments demonstrate that using a truncated rotation-invariant kernel provides competitive results compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-07-28T08:06:24Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Deep Learning Approximation of Diffeomorphisms via Linear-Control
Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls.
We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Multiple Kernel Representation Learning on Networks [12.106994960669924]
We propose a weighted matrix factorization model that encodes random walk-based information about nodes of the network.
We extend the approach with a multiple kernel learning formulation that provides the flexibility of learning the kernel as the linear combination of a dictionary of kernels.
arXiv Detail & Related papers (2021-06-09T13:22:26Z) - End-to-end Kernel Learning via Generative Random Fourier Features [31.57596752889935]
Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case.
In this paper, we consider a one-stage process that incorporates the kernel learning and linear learner into a unifying framework.
arXiv Detail & Related papers (2020-09-10T00:27:39Z) - Fast Learning in Reproducing Kernel Krein Spaces via Signed Measures [31.986482149142503]
We cast this question as a distribution view by introducing the emphsigned measure
A series of non-PD kernels can be associated with the linear combination of specific finite Borel measures.
Specifically, this solution is also computationally implementable in practice to scale non-PD kernels in large sample cases.
arXiv Detail & Related papers (2020-05-30T12:10:35Z) - SimpleMKKM: Simple Multiple Kernel K-means [49.500663154085586]
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM)
Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix.
We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error.
arXiv Detail & Related papers (2020-05-11T10:06:40Z) - Distributed Averaging Methods for Randomized Second Order Optimization [54.51566432934556]
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a bottleneck.
We develop unbiased parameter averaging methods for randomized second order optimization that employ sampling and sketching of the Hessian.
We also extend the framework of second order averaging methods to introduce an unbiased distributed optimization framework for heterogeneous computing systems.
arXiv Detail & Related papers (2020-02-16T09:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.