Truncated Kernel Stochastic Gradient Descent on Spheres
- URL: http://arxiv.org/abs/2410.01570v2
- Date: Fri, 4 Oct 2024 13:51:16 GMT
- Title: Truncated Kernel Stochastic Gradient Descent on Spheres
- Authors: JinHui Bai, Lei Shi,
- Abstract summary: Inspired by the structure of spherical harmonics, we propose the truncated kernel gradient descent (T- Kernel SGD) algorithm.
T- Kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in gradient descent.
In contrast to traditional kernel SGD, T- Kernel SGD is more effective in balancing bias and variance.
- Score: 1.4583059436979549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in stochastic gradient descent, thereby avoiding the difficulties of finding suitable closed-form kernel functions in high-dimensional spaces. In contrast to traditional kernel SGD, T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only $\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$ computational complexity and $\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ storage to achieve optimal rates for the d-dimensional sphere, where $0<\epsilon<\frac{1}{2}$ can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.
Related papers
- GRAPE optimization for open quantum systems with time-dependent
decoherence rates driven by coherent and incoherent controls [77.34726150561087]
The GRadient Ascent Pulse Engineering (GRAPE) method is widely used for optimization in quantum control.
We adopt GRAPE method for optimizing objective functionals for open quantum systems driven by both coherent and incoherent controls.
The efficiency of the algorithm is demonstrated through numerical simulations for the state-to-state transition problem.
arXiv Detail & Related papers (2023-07-17T13:37:18Z) - On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z) - CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence [9.11726703830074]
In this paper, we consider solving the distributed optimization problem under the communication restricted setting.
We show the method over com-pressed exact diffusion termed CEDAS"
In particular, when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when
arXiv Detail & Related papers (2023-01-14T09:49:15Z) - Utilising the CLT Structure in Stochastic Gradient based Sampling :
Improved Analysis and Faster Algorithms [14.174806471635403]
We consider approximations of sampling algorithms, such as Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD)
We observe that the noise introduced by the approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian.
We harness this structure to absorb the approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms.
arXiv Detail & Related papers (2022-06-08T10:17:40Z) - Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent:
Convergence Guarantees and Empirical Benefits [21.353189917487512]
gradient descent (SGD) and its variants have established themselves as the go-to algorithms for machine learning problems.
We take a step forward by proving minibatch SGD converges to a critical point of the full log-likelihood loss function.
Our theoretical guarantees hold provided that the kernel functions exhibit exponential or eigendecay.
arXiv Detail & Related papers (2021-11-19T22:28:47Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Improving the Transient Times for Distributed Stochastic Gradient
Methods [5.215491794707911]
We study a distributed gradient algorithm, called exact diffusion adaptive stepsizes (EDAS)
We show EDAS achieves the same network independent convergence rate as centralized gradient descent (SGD)
To the best of our knowledge, EDAS achieves the shortest time when the average of the $n$ cost functions is strongly convex.
arXiv Detail & Related papers (2021-05-11T08:09:31Z) - Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR)
This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling.
Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z) - Convergence of Gaussian-smoothed optimal transport distance with
sub-gamma distributions and dependent samples [12.77426855794452]
This paper provides convergence guarantees for estimating the GOT distance under more general settings.
A key step in our analysis is to show that the GOT distance is dominated by a family of kernel maximum mean discrepancy distances.
arXiv Detail & Related papers (2021-02-28T04:30:23Z) - Spectral density estimation with the Gaussian Integral Transform [91.3755431537592]
spectral density operator $hatrho(omega)=delta(omega-hatH)$ plays a central role in linear response theory.
We describe a near optimal quantum algorithm providing an approximation to the spectral density.
arXiv Detail & Related papers (2020-04-10T03:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.