Related papers: Truncated Kernel Stochastic Gradient Descent with General Losses and Spherical Radial Basis Functions

Truncated Kernel Stochastic Gradient Descent with General Losses and Spherical Radial Basis Functions

URL: http://arxiv.org/abs/2510.04237v3
Date: Thu, 30 Oct 2025 09:14:25 GMT
Title: Truncated Kernel Stochastic Gradient Descent with General Losses and Spherical Radial Basis Functions
Authors: Jinhui Bai, Andreas Christmann, Lei Shi,
Abstract summary: We propose a novel kernel gradient unifies descent (SGD) algorithm for large-scale supervised learning with general losses.<n>Compared to traditional kernel SGD, our algorithm improves efficiency and scalability through an innovative regularization strategy.<n>Our framework accommodates a broad class of classical loss functions, including least-squares, Huber, and logistic losses.
Score: 3.308257914042897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a novel kernel stochastic gradient descent (SGD) algorithm for large-scale supervised learning with general losses. Compared to traditional kernel SGD, our algorithm improves efficiency and scalability through an innovative regularization strategy. By leveraging the infinite series expansion of spherical radial basis functions, this strategy projects the stochastic gradient onto a finite-dimensional hypothesis space, which is adaptively scaled according to the bias-variance trade-off, thereby enhancing generalization performance. Based on a new estimation of the spectral structure of the kernel-induced covariance operator, we develop an analytical framework that unifies optimization and generalization analyses. We prove that both the last iterate and the suffix average converge at minimax-optimal rates, and we further establish optimal strong convergence in the reproducing kernel Hilbert space. Our framework accommodates a broad class of classical loss functions, including least-squares, Huber, and logistic losses. Moreover, the proposed algorithm significantly reduces computational complexity and achieves optimal storage complexity by incorporating coordinate-wise updates from linear SGD, thereby avoiding the costly pairwise operations typical of kernel SGD and enabling efficient processing of streaming data. Finally, extensive numerical experiments demonstrate the efficiency of our approach.

Related papers

Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel [55.82768375605861]
We establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity for kernel methods.<n>Unlike static kernels such as NTK, the LPK captures the entire training trajectory, adapting to both data and optimization dynamics.
arXiv Detail & Related papers (2025-06-12T23:17:09Z)
Scalable Kernel Inverse Optimization [2.799896314754615]
Inverse optimization is a framework for learning the unknown objective function of an expert decision-maker from a past dataset. We extend the hypothesis class of IO objective functions to a reproducing a kernel Hilbert space. We show that a variant of the representer theorem holds for a specific training loss, allowing the reformulation of the problem as a finite-dimensional convex optimization program.
arXiv Detail & Related papers (2024-10-31T14:06:43Z)
Truncated Kernel Stochastic Gradient Descent on Spheres [1.4583059436979549]
Inspired by the structure of spherical harmonics, we propose the truncated kernel gradient descent (T- Kernel SGD) algorithm.<n>T- Kernel SGD has a least-square loss function for spherical data fitting.
arXiv Detail & Related papers (2024-10-02T14:09:51Z)
Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
Generalization Properties of Stochastic Optimizers via Trajectory Analysis [48.38493838310503]
We show that both the Fernique-Talagrand functional and the local powerlaw are predictive of generalization performance. We show that both our Fernique-Talagrand functional and the local powerlaw are predictive of generalization performance.
arXiv Detail & Related papers (2021-08-02T10:58:32Z)
Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections. Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Convergence Analysis of Homotopy-SGD for non-convex optimization [43.71213126039448]
We present a first-order algorithm based on a combination of homotopy methods and SGD, called Gradienty-Stoch Descent (H-SGD) Under some assumptions, we conduct a theoretical analysis of the proposed problem. Experimental results show that H-SGD can outperform SGD.
arXiv Detail & Related papers (2020-11-20T09:50:40Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.