Related papers: Mirror Descent on Reproducing Kernel Banach Spaces

Mirror Descent on Reproducing Kernel Banach Spaces

URL: http://arxiv.org/abs/2411.11242v1
Date: Mon, 18 Nov 2024 02:18:32 GMT
Title: Mirror Descent on Reproducing Kernel Banach Spaces
Authors: Akash Kumar, Mikhail Belkin, Parthe Pandit,
Abstract summary: This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel. We propose an algorithm that employs gradient steps in the dual space of the Banach space using the reproducing kernel. To instantiate this algorithm in practice, we introduce a novel family of RKBSs with $p$-norm.
Score: 12.716091600034543
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in machine learning have led to increased interest in reproducing kernel Banach spaces (RKBS) as a more general framework that extends beyond reproducing kernel Hilbert spaces (RKHS). These works have resulted in the formulation of representer theorems under several regularized learning schemes. However, little is known about an optimization method that encompasses these results in this setting. This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS. To tackle this challenge, we propose an algorithm based on mirror descent (MDA). Our approach involves an iterative method that employs gradient steps in the dual space of the Banach space using the reproducing kernel. We analyze the convergence properties of our algorithm under various assumptions and establish two types of results: first, we identify conditions under which a linear convergence rate is achievable, akin to optimization in the Euclidean setting, and provide a proof of the linear rate; second, we demonstrate a standard convergence rate in a constrained setting. Moreover, to instantiate this algorithm in practice, we introduce a novel family of RKBSs with $p$-norm ($p \neq 2$), characterized by both an explicit dual map and a kernel.

Related papers

Consequences of Kernel Regularity for Bandit Optimization [4.655159257282135]
We show that kernel-based and locally adaptive algorithms can be analyzed within a unified framework.<n>This allows us to derive explicit regret bounds for each kernel family, obtaining novel results in several cases and providing improved analysis for others.
arXiv Detail & Related papers (2025-12-05T18:54:09Z)
Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective [73.18641268511318]
We propose a graph-based clustering algorithm that only relaxes the orthonormal constraint to derive clustering results.<n>To ensure a doubly constraint into a gradient, we transform the non-negative constraint into a class probability parameter.
arXiv Detail & Related papers (2025-09-23T09:14:39Z)
On the Convergence of Irregular Sampling in Reproducing Kernel Hilbert Spaces [0.0]
We discuss approximation properties of kernel regression under minimalistic assumptions on both the kernel and the input data. We first prove error estimates in the kernel's RKHS norm. This leads to new results concerning uniform convergence of kernel regression on compact domains.
arXiv Detail & Related papers (2025-04-18T10:57:16Z)
Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning [33.34053480377887]
This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs)
arXiv Detail & Related papers (2024-06-03T15:28:12Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates. Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z)
Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints [7.936519714074615]
The Kaczmarz matrix (KZ) and its variants have been extensively studied due to their simplicity and efficiency in solving sub linear equation systems. We provide the first theoretical convergence guarantees for KHT by showing that it converges linearly to the solution of a system with sparsity constraints.
arXiv Detail & Related papers (2023-04-20T07:14:24Z)
Coefficient-based Regularized Distribution Regression [4.21768682940933]
We consider the coefficient-based regularized distribution regression which aims to regress from probability measures to real-valued responses over a kernel reproducing Hilbert space (RKHS) Asymptotic behaviors of the algorithm in different regularity ranges of the regression function are comprehensively studied. We get the optimal rates under some mild conditions, which matches the one-stage sampled minimax optimal rate.
arXiv Detail & Related papers (2022-08-26T03:46:14Z)
Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques. A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning. A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods. We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z)
Gaussian Processes and Statistical Decision-making in Non-Euclidean Spaces [96.53463532832939]
We develop techniques for broadening the applicability of Gaussian processes. We introduce a wide class of efficient approximations built from this viewpoint. We develop a collection of Gaussian process models over non-Euclidean spaces.
arXiv Detail & Related papers (2022-02-22T01:42:57Z)
Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections. Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z)
Fast Learning in Reproducing Kernel Krein Spaces via Signed Measures [31.986482149142503]
We cast this question as a distribution view by introducing the emphsigned measure A series of non-PD kernels can be associated with the linear combination of specific finite Borel measures. Specifically, this solution is also computationally implementable in practice to scale non-PD kernels in large sample cases.
arXiv Detail & Related papers (2020-05-30T12:10:35Z)
Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms. Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.