Entropy Regularized Power k-Means Clustering
- URL: http://arxiv.org/abs/2001.03452v1
- Date: Fri, 10 Jan 2020 14:05:44 GMT
- Title: Entropy Regularized Power k-Means Clustering
- Authors: Saptarshi Chakraborty, Debolina Paul, Swagatam Das, Jason Xu
- Abstract summary: We propose a scalable majorization-minimization algorithm that enjoys closed-form updates and convergence guarantees.
Our method retains the same computational complexity of $k$-means and power $k$-means, but yields significant improvements over both.
- Score: 21.013169939337583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite its well-known shortcomings, $k$-means remains one of the most widely
used approaches to data clustering. Current research continues to tackle its
flaws while attempting to preserve its simplicity. Recently, the \textit{power
$k$-means} algorithm was proposed to avoid trapping in local minima by
annealing through a family of smoother surfaces. However, the approach lacks
theoretical justification and fails in high dimensions when many features are
irrelevant. This paper addresses these issues by introducing \textit{entropy
regularization} to learn feature relevance while annealing. We prove
consistency of the proposed approach and derive a scalable
majorization-minimization algorithm that enjoys closed-form updates and
convergence guarantees. In particular, our method retains the same
computational complexity of $k$-means and power $k$-means, but yields
significant improvements over both. Its merits are thoroughly assessed on a
suite of real and synthetic data experiments.
Related papers
- S-CFE: Simple Counterfactual Explanations [21.975560789792073]
We tackle the problem of finding manifold-aligned counterfactual explanations for sparse data.
Our approach effectively produces sparse, manifold-aligned counterfactual explanations.
arXiv Detail & Related papers (2024-10-21T07:42:43Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Are Easy Data Easy (for K-Means) [0.0]
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm.
A new algorithm is proposed that is a variation of $k$-means++ via repeated subsampling when choosing a seed.
arXiv Detail & Related papers (2023-08-02T09:40:19Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Kernel k-Means, By All Means: Algorithms and Strong Consistency [21.013169939337583]
Kernel $k$ clustering is a powerful tool for unsupervised learning of non-linear data.
In this paper, we generalize results leveraging a general family of means to combat sub-optimal local solutions.
Our algorithm makes use of majorization-minimization (MM) to better solve this non-linear separation problem.
arXiv Detail & Related papers (2020-11-12T16:07:18Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z) - Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms.
Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.