Related papers: Consequences of Kernel Regularity for Bandit Optimization

Consequences of Kernel Regularity for Bandit Optimization

URL: http://arxiv.org/abs/2512.05957v1
Date: Fri, 05 Dec 2025 18:54:09 GMT
Title: Consequences of Kernel Regularity for Bandit Optimization
Authors: Madison Lee, Tara Javidi,
Abstract summary: We show that kernel-based and locally adaptive algorithms can be analyzed within a unified framework.<n>This allows us to derive explicit regret bounds for each kernel family, obtaining novel results in several cases and providing improved analysis for others.
Score: 4.655159257282135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work we investigate the relationship between kernel regularity and algorithmic performance in the bandit optimization of RKHS functions. While reproducing kernel Hilbert space (RKHS) methods traditionally rely on global kernel regressors, it is also common to use a smoothness-based approach that exploits local approximations. We show that these perspectives are deeply connected through the spectral properties of isotropic kernels. In particular, we characterize the Fourier spectra of the Matérn, square-exponential, rational-quadratic, $γ$-exponential, piecewise-polynomial, and Dirichlet kernels, and show that the decay rate determines asymptotic regret from both viewpoints. For kernelized bandit algorithms, spectral decay yields upper bounds on the maximum information gain, governing worst-case regret, while for smoothness-based methods, the same decay rates establish Hölder space embeddings and Besov space norm-equivalences, enabling local continuity analysis. These connections show that kernel-based and locally adaptive algorithms can be analyzed within a unified framework. This allows us to derive explicit regret bounds for each kernel family, obtaining novel results in several cases and providing improved analysis for others. Furthermore, we analyze LP-GP-UCB, an algorithm that combines both approaches, augmenting global Gaussian process surrogates with local polynomial estimators. While the hybrid approach does not uniformly dominate specialized methods, it achieves order-optimality across multiple kernel families.

Related papers

Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective [73.18641268511318]
We propose a graph-based clustering algorithm that only relaxes the orthonormal constraint to derive clustering results.<n>To ensure a doubly constraint into a gradient, we transform the non-negative constraint into a class probability parameter.
arXiv Detail & Related papers (2025-09-23T09:14:39Z)
Random feature approximation for general spectral methods [2.9388890036358104]
This work extends previous results for Tikhonov regularization to a broad class of spectral regularization techniques.<n>We enable a theoretical analysis of neural networks and neural operators through the lens of the Neural Tangent Kernel (NTK) approach.
arXiv Detail & Related papers (2025-06-19T13:00:17Z)
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel [55.82768375605861]
We establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity for kernel methods.<n>Unlike static kernels such as NTK, the LPK captures the entire training trajectory, adapting to both data and optimization dynamics.
arXiv Detail & Related papers (2025-06-12T23:17:09Z)
Mirror Descent on Reproducing Kernel Banach Spaces [12.716091600034543]
This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel. We propose an algorithm that employs gradient steps in the dual space of the Banach space using the reproducing kernel. To instantiate this algorithm in practice, we introduce a novel family of RKBSs with $p$-norm.
arXiv Detail & Related papers (2024-11-18T02:18:32Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Random feature approximation for general spectral methods [0.0]
We analyze generalization properties for a large class of spectral regularization methods combined with random features. For our estimators we obtain optimal learning rates over gradient regularity classes.
arXiv Detail & Related papers (2023-08-29T16:56:03Z)
On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates. Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z)
Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections. Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z)
Towards a Unified Quadrature Framework for Large-Scale Kernel Machines [35.32894170512829]
We develop a quadrature framework for large-scale kernel machines via a numerical integration representation. We leverage fully symmetric interpolatory rules to efficiently compute quadrature nodes and associated weights for kernel approximation.
arXiv Detail & Related papers (2020-11-03T12:48:07Z)
Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS [19.252319300590653]
A black-box function $f:mathcalX mapsto mathbbR$ is optimized under the assumption that $f$ is H"older smooth and has bounded norm in the RKHS associated with a given kernel $K$. In this paper, we propose a new algorithm (textttLP-GP-UCB) where the usual GP surrogate model is augmented with Local Polynomial (LP) estimators of the H" smooth parameter $f$.
arXiv Detail & Related papers (2020-05-11T01:55:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.