Global Optimization for Cardinality-constrained Minimum Sum-of-Squares
Clustering via Semidefinite Programming
- URL: http://arxiv.org/abs/2209.08901v3
- Date: Thu, 7 Sep 2023 16:12:17 GMT
- Title: Global Optimization for Cardinality-constrained Minimum Sum-of-Squares
Clustering via Semidefinite Programming
- Authors: Veronica Piccialli, Antonio M. Sudoso
- Abstract summary: The minimum sum-of-squares clustering (MSSC) has been recently extended to exploit prior knowledge on the cardinality of each cluster.
We propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC.
For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node.
- Score: 1.3053649021965603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has
been recently extended to exploit prior knowledge on the cardinality of each
cluster. Such knowledge is used to increase performance as well as solution
quality. In this paper, we propose a global optimization approach based on the
branch-and-cut technique to solve the cardinality-constrained MSSC. For the
lower bound routine, we use the semidefinite programming (SDP) relaxation
recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239,
(2019)]. However, this relaxation can be used in a branch-and-cut method only
for small-size instances. Therefore, we derive a new SDP relaxation that scales
better with the instance size and the number of clusters. In both cases, we
strengthen the bound by adding polyhedral cuts. Benefiting from a tailored
branching strategy which enforces pairwise constraints, we reduce the
complexity of the problems arising in the children nodes. For the upper bound,
instead, we present a local search procedure that exploits the solution of the
SDP relaxation solved at each node. Computational results show that the
proposed algorithm globally solves, for the first time, real-world instances of
size 10 times larger than those solved by state-of-the-art exact methods.
Related papers
- A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering [0.30693357740321775]
The minimum sum-of-squares clustering problem (MSSC) refers to the problem of partitioning $n$ data points into $k$ clusters.
We propose an efficient algorithm for solving large-scale MSSC instances, which combines column generation (CG) with dynamic constraint aggregation (DCA)
arXiv Detail & Related papers (2024-10-08T16:51:28Z) - Gap-Free Clustering: Sensitivity and Robustness of SDP [6.996002801232415]
We study graph clustering in the Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters.
Previous convex relaxation approaches achieving exact recovery do not allow any small clusters of size $o(sqrtn)$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster.
We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes.
arXiv Detail & Related papers (2023-08-29T21:27:21Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC.
Background knowledge is incorporated as pairwise must-link and cannot-link constraints.
For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Clustering with Penalty for Joint Occurrence of Objects: Computational
Aspects [0.0]
The method of Hol'y, Sokol and vCern'y clusters objects based on their incidence in a large number of given sets.
The idea is to minimize the occurrence of multiple objects from the same cluster in the same set.
In the current paper, we study computational aspects of the method.
arXiv Detail & Related papers (2021-02-02T10:39:27Z) - Community detection using fast low-cardinality semidefinite programming [94.4878715085334]
We propose a new low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from Leiden-k-cut.
This proposed algorithm is scalable, outperforms state-of-the-art algorithms, and outperforms in real-world time with little additional cost.
arXiv Detail & Related papers (2020-12-04T15:46:30Z) - An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering [2.5182813818441945]
Recently introduced convex clustering approach formulates clustering as a convex optimization problem.
State-of-the-art convex clustering algorithms require large computation and memory space.
In this paper, we develop a very efficient smoothing gradient algorithm (Sproga) for convex clustering.
arXiv Detail & Related papers (2020-06-22T20:02:59Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.