Related papers: Convergence of Mean Shift Algorithms for Large Bandwidths and Simultaneous Accurate Clustering

Convergence of Mean Shift Algorithms for Large Bandwidths and Simultaneous Accurate Clustering

URL: http://arxiv.org/abs/2506.19837v1
Date: Tue, 24 Jun 2025 17:53:29 GMT
Title: Convergence of Mean Shift Algorithms for Large Bandwidths and Simultaneous Accurate Clustering
Authors: Susovan Pal, Praneeth Vepakomma,
Abstract summary: Mean shift (MS) is a non-parametric, density-based, iterative algorithm that has prominent usage in clustering and image segmentation.<n>We show that for textit sufficiently large bandwidth convergence is guaranteed in any dimension with textitany radially symmetric and strictly positive definite kernels.
Score: 3.038423178022283
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The mean shift (MS) is a non-parametric, density-based, iterative algorithm that has prominent usage in clustering and image segmentation. A rigorous proof for its convergence in full generality remains unknown. Two significant steps in this direction were taken in the paper \cite{Gh1}, which proved that for \textit{sufficiently large bandwidth}, the MS algorithm with the Gaussian kernel always converges in any dimension, and also by the same author in \cite{Gh2}, proved that MS always converges in one dimension for kernels with differentiable, strictly decreasing, convex profiles. In the more recent paper \cite{YT}, they have proved the convergence in more generality,\textit{ without any restriction on the bandwidth}, with the assumption that the KDE $f$ has a continuous Lipschitz gradient on the closure of the convex hull of the trajectory of the iterated sequence of the mode estimate, and also satisfies the {\L}ojasiewicz property there. The main theoretical result of this paper is a generalization of those of \cite{Gh1}, where we show that (1) for\textit{ sufficiently large bandwidth} convergence is guaranteed in any dimension with \textit{any radially symmetric and strictly positive definite kernels}. The proof uses two alternate characterizations of radially symmetric positive definite smooth kernels by Schoenberg and Bernstein \cite{Fass}, and borrows some steps from the proofs in \cite{Gh1}. Although the authors acknowledge that the result in that paper is more restrictive than that of \cite{YT} due to the lower bandwidth limit, it uses a different set of assumptions than \cite{YT}, and the proof technique is different.

Related papers

How well behaved is finite dimensional Diffusion Maps? [0.0]
We derive a series of properties that remain valid after finite-dimensional and almost isometric Diffusion Maps (DM)<n>We quantify the error between the estimated tangent spaces and the true tangent spaces over the submanifolds after the DM embedding.<n>These results offer a solid theoretical foundation for understanding the performance and reliability of DM in practical applications.
arXiv Detail & Related papers (2024-12-05T09:12:25Z)
Invariant kernels on Riemannian symmetric spaces: a harmonic-analytic approach [6.5497574505866885]
This work aims to prove that the classical Gaussian kernel, when defined on a non-Euclidean symmetric space, is never positive-definite for any choice of parameter. New results lay out a blueprint for the study of invariant kernels on symmetric spaces.
arXiv Detail & Related papers (2023-10-30T05:06:52Z)
Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting [64.0722630873758]
We consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Differential Equation. We prove the bound in $W_2$-distance between the laws of our Ito chain and differential equation.
arXiv Detail & Related papers (2023-10-09T18:38:56Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions. We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z)
Non-asymptotic convergence bounds for Sinkhorn iterates and their gradients: a coupling approach [10.568851068989972]
We focus on a relaxation of the original OT problem, the entropic OT problem, which allows to implement efficient and practical algorithmic solutions. This formulation, also known as the Schr"odinger Bridge problem, notably connects with Optimal Control (SOC) and can be solved with the popular Sinkhorn algorithm.
arXiv Detail & Related papers (2023-04-13T13:58:25Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
Nonparametric approximation of conditional expectation operators [0.3655021726150368]
We investigate the approximation of the $L2$-operator defined by $[Pf](x) := mathbbE[ f(Y) mid X = x ]$ under minimal assumptions. We prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on a reproducing kernel space.
arXiv Detail & Related papers (2020-12-23T19:06:12Z)
Asymptotics of Entropy-Regularized Optimal Transport via Chaos Decomposition [1.7188280334580195]
This paper is on the properties of a discrete Schr"odinger bridge as $N$ tends to infinity. We derive the first two error terms of orders $N-1/2$ and $N-1$, respectively. The kernels corresponding to the first and second order chaoses are given by Markov operators which have natural interpretations in the Sinkhorn algorithm.
arXiv Detail & Related papers (2020-11-17T21:55:46Z)
Metrizing Weak Convergence with Maximum Mean Discrepancies [88.54422104669078]
This paper characterizes the maximum mean discrepancies (MMD) that metrize the weak convergence of probability measures for a wide class of kernels. We prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, metrizes the weak convergence of probability measures if and only if k is continuous.
arXiv Detail & Related papers (2020-06-16T15:49:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.