Kernel k-Means, By All Means: Algorithms and Strong Consistency
- URL: http://arxiv.org/abs/2011.06461v1
- Date: Thu, 12 Nov 2020 16:07:18 GMT
- Title: Kernel k-Means, By All Means: Algorithms and Strong Consistency
- Authors: Debolina Paul, Saptarshi Chakraborty, Swagatam Das and Jason Xu
- Abstract summary: Kernel $k$ clustering is a powerful tool for unsupervised learning of non-linear data.
In this paper, we generalize results leveraging a general family of means to combat sub-optimal local solutions.
Our algorithm makes use of majorization-minimization (MM) to better solve this non-linear separation problem.
- Score: 21.013169939337583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel $k$-means clustering is a powerful tool for unsupervised learning of
non-linearly separable data. Since the earliest attempts, researchers have
noted that such algorithms often become trapped by local minima arising from
non-convexity of the underlying objective function. In this paper, we
generalize recent results leveraging a general family of means to combat
sub-optimal local solutions to the kernel and multi-kernel settings. Called
Kernel Power $k$-Means, our algorithm makes use of majorization-minimization
(MM) to better solve this non-convex problem. We show the method implicitly
performs annealing in kernel feature space while retaining efficient,
closed-form updates, and we rigorously characterize its convergence properties
both from computational and statistical points of view. In particular, we
characterize the large sample behavior of the proposed method by establishing
strong consistency guarantees. Its merits are thoroughly validated on a suite
of simulated datasets and real data benchmarks that feature non-linear and
multi-view separation.
Related papers
- Kernel Alignment for Unsupervised Feature Selection via Matrix Factorization [8.020732438595905]
unsupervised feature selection has been proven effective in alleviating the so-called curse of dimensionality.
Most existing matrix factorization-based unsupervised feature selection methods are built upon subspace learning.
In this paper, we construct a model by integrating kernel functions and kernel alignment, which can be equivalently characterized as a matrix factorization problem.
By doing so, our model can learn both linear and nonlinear similarity information and automatically generate the most appropriate kernel.
arXiv Detail & Related papers (2024-03-13T20:35:44Z) - Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering [21.685153346752124]
Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities.
We introduce a novel method that systematically integrates both kernel correlation and dissimilarity.
By emphasizing the coherence between kernel correlation and dissimilarity, our method offers a more objective and transparent strategy for extracting non-linear information.
arXiv Detail & Related papers (2024-03-06T04:24:43Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Joint Embedding Self-Supervised Learning in the Kernel Regime [21.80241600638596]
Self-supervised learning (SSL) produces useful representations of data without access to any labels for classifying the data.
We extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel.
We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.
arXiv Detail & Related papers (2022-09-29T15:53:19Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - Taming Nonconvexity in Kernel Feature Selection---Favorable Properties
of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection.
The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - SimpleMKKM: Simple Multiple Kernel K-means [49.500663154085586]
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM)
Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix.
We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error.
arXiv Detail & Related papers (2020-05-11T10:06:40Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z) - Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms.
Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z) - Entropy Regularized Power k-Means Clustering [21.013169939337583]
We propose a scalable majorization-minimization algorithm that enjoys closed-form updates and convergence guarantees.
Our method retains the same computational complexity of $k$-means and power $k$-means, but yields significant improvements over both.
arXiv Detail & Related papers (2020-01-10T14:05:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.