Related papers: Certifying clusters from sum-of-norms clustering

Certifying clusters from sum-of-norms clustering

URL: http://arxiv.org/abs/2006.11355v2
Date: Thu, 8 Jul 2021 06:13:34 GMT
Title: Certifying clusters from sum-of-norms clustering
Authors: Tao Jiang, Stephen Vavasis
Abstract summary: We present a clustering test that identifies and certifies the correct cluster assignment from an approximate solution. We show the correct cluster assignment is guaranteed to be certified by a primal-dual path following algorithm after sufficiently many iterations.
Score: 13.747619681451875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sum-of-norms clustering is a clustering formulation based on convex optimization that automatically induces hierarchy. Multiple algorithms have been proposed to solve the optimization problem: subgradient descent by Hocking et al., ADMM and ADA by Chi and Lange, stochastic incremental algorithm by Panahi et al. and semismooth Newton-CG augmented Lagrangian method by Sun et al. All algorithms yield approximate solutions, even though an exact solution is demanded to determine the correct cluster assignment. The purpose of this paper is to close the gap between the output from existing algorithms and the exact solution to the optimization problem. We present a clustering test that identifies and certifies the correct cluster assignment from an approximate solution yielded by any primal-dual algorithm. Our certification validates clustering for both unit and multiplicative weights. The test may not succeed if the approximation is inaccurate. However, we show the correct cluster assignment is guaranteed to be certified by a primal-dual path following algorithm after sufficiently many iterations, provided that the model parameter $\lambda$ avoids a finite number of bad values. Numerical experiments are conducted on Gaussian mixture and half-moon data, which indicate that carefully chosen multiplicative weights increase the recovery power of sum-of-norms clustering.

Related papers

Exact and Heuristic Algorithms for Constrained Biclustering [0.0]
Biclustering, also known as co-clustering or two-way clustering, simultaneously partitions the rows and columns of a data matrix to reveal submatrices with coherent patterns.<n>We study constrained biclustering with pairwise constraints, namely must-link and cannot-link constraints, which specify whether objects should belong to the same or different biclusters.
arXiv Detail & Related papers (2025-08-07T15:29:22Z)
A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models. We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data. Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity. We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Regularization and Optimization in Model-Based Clustering [4.096453902709292]
k-means algorithm variants essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution. We develop more effective optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting. These results shed new light on the current status quo between GMM and k-means methods and suggest the more frequent use of general GMMs for data exploration.
arXiv Detail & Related papers (2023-02-05T18:22:29Z)
Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering. We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset. We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC. Background knowledge is incorporated as pairwise must-link and cannot-link constraints. For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z)
Solving correlation clustering with QAOA and a Rydberg qudit system: a full-stack approach [94.37521840642141]
We study the correlation clustering problem using the quantum approximate optimization algorithm (QAOA) and qudits. Specifically, we consider a neutral atom quantum computer and propose a full stack approach for correlation clustering. We show the qudit implementation is superior to the qubit encoding as quantified by the gate count.
arXiv Detail & Related papers (2021-06-22T11:07:38Z)
Clustering with Penalty for Joint Occurrence of Objects: Computational Aspects [0.0]
The method of Hol'y, Sokol and vCern'y clusters objects based on their incidence in a large number of given sets. The idea is to minimize the occurrence of multiple objects from the same cluster in the same set. In the current paper, we study computational aspects of the method.
arXiv Detail & Related papers (2021-02-02T10:39:27Z)
An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering [2.5182813818441945]
Recently introduced convex clustering approach formulates clustering as a convex optimization problem. State-of-the-art convex clustering algorithms require large computation and memory space. In this paper, we develop a very efficient smoothing gradient algorithm (Sproga) for convex clustering.
arXiv Detail & Related papers (2020-06-22T20:02:59Z)
Efficient Path Algorithms for Clustered Lasso and OSCAR [0.0]
This paper proposes efficient path algorithms for clustered Lasso and OSCAR to construct solution paths with respect to their regularization parameters. The proposed algorithms are shown to be more efficient than existing algorithms in numerical experiments.
arXiv Detail & Related papers (2020-06-16T07:43:57Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.