Certifying clusters from sum-of-norms clustering
- URL: http://arxiv.org/abs/2006.11355v2
- Date: Thu, 8 Jul 2021 06:13:34 GMT
- Title: Certifying clusters from sum-of-norms clustering
- Authors: Tao Jiang, Stephen Vavasis
- Abstract summary: We present a clustering test that identifies and certifies the correct cluster assignment from an approximate solution.
We show the correct cluster assignment is guaranteed to be certified by a primal-dual path following algorithm after sufficiently many iterations.
- Score: 13.747619681451875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sum-of-norms clustering is a clustering formulation based on convex
optimization that automatically induces hierarchy. Multiple algorithms have
been proposed to solve the optimization problem: subgradient descent by Hocking
et al., ADMM and ADA by Chi and Lange, stochastic incremental algorithm by
Panahi et al. and semismooth Newton-CG augmented Lagrangian method by Sun et
al. All algorithms yield approximate solutions, even though an exact solution
is demanded to determine the correct cluster assignment. The purpose of this
paper is to close the gap between the output from existing algorithms and the
exact solution to the optimization problem. We present a clustering test that
identifies and certifies the correct cluster assignment from an approximate
solution yielded by any primal-dual algorithm. Our certification validates
clustering for both unit and multiplicative weights. The test may not succeed
if the approximation is inaccurate. However, we show the correct cluster
assignment is guaranteed to be certified by a primal-dual path following
algorithm after sufficiently many iterations, provided that the model parameter
$\lambda$ avoids a finite number of bad values. Numerical experiments are
conducted on Gaussian mixture and half-moon data, which indicate that carefully
chosen multiplicative weights increase the recovery power of sum-of-norms
clustering.
Related papers
- A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data.
Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity.
We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Regularization and Optimization in Model-Based Clustering [4.096453902709292]
k-means algorithm variants essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution.
We develop more effective optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting.
These results shed new light on the current status quo between GMM and k-means methods and suggest the more frequent use of general GMMs for data exploration.
arXiv Detail & Related papers (2023-02-05T18:22:29Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC.
Background knowledge is incorporated as pairwise must-link and cannot-link constraints.
For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z) - Solving correlation clustering with QAOA and a Rydberg qudit system: a
full-stack approach [94.37521840642141]
We study the correlation clustering problem using the quantum approximate optimization algorithm (QAOA) and qudits.
Specifically, we consider a neutral atom quantum computer and propose a full stack approach for correlation clustering.
We show the qudit implementation is superior to the qubit encoding as quantified by the gate count.
arXiv Detail & Related papers (2021-06-22T11:07:38Z) - Clustering with Penalty for Joint Occurrence of Objects: Computational
Aspects [0.0]
The method of Hol'y, Sokol and vCern'y clusters objects based on their incidence in a large number of given sets.
The idea is to minimize the occurrence of multiple objects from the same cluster in the same set.
In the current paper, we study computational aspects of the method.
arXiv Detail & Related papers (2021-02-02T10:39:27Z) - An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering [2.5182813818441945]
Recently introduced convex clustering approach formulates clustering as a convex optimization problem.
State-of-the-art convex clustering algorithms require large computation and memory space.
In this paper, we develop a very efficient smoothing gradient algorithm (Sproga) for convex clustering.
arXiv Detail & Related papers (2020-06-22T20:02:59Z) - Efficient Path Algorithms for Clustered Lasso and OSCAR [0.0]
This paper proposes efficient path algorithms for clustered Lasso and OSCAR to construct solution paths with respect to their regularization parameters.
The proposed algorithms are shown to be more efficient than existing algorithms in numerical experiments.
arXiv Detail & Related papers (2020-06-16T07:43:57Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.