A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems
- URL: http://arxiv.org/abs/2503.04447v1
- Date: Thu, 06 Mar 2025 14:02:28 GMT
- Title: A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems
- Authors: Wei Liu, Xin Liu, Michael K. Ng, Zaikun Zhang,
- Abstract summary: We view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset.<n>We propose a block coordinate descent algorithm to efficiently solve this model.<n>We can construct the clusters that theoretically meet the must-link constraints under mild assumptions.
- Score: 24.208152437317768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised clustering is a basic problem in various applications. Most existing methods require knowledge of the ideal cluster number, which is often difficult to obtain in practice. Besides, satisfying the must-link constraints is another major challenge for these methods. In this work, we view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset, where the similarity matrix includes a scaling parameter to reflect the must-link constraints. Utilizing a relaxation technique, we formulate the graph partitioning problem into a continuous optimization model that does not require the exact cluster number, but only an overestimate of it. We then propose a block coordinate descent algorithm to efficiently solve this model, and establish its convergence result. Based on the obtained solution, we can construct the clusters that theoretically meet the must-link constraints under mild assumptions. Furthermore, we verify the effectiveness and efficiency of our proposed method through comprehensive numerical experiments.
Related papers
- Counterfactual Explanations for k-means and Gaussian Clustering [1.8561812622368767]
We present a general definition for counterfactuals for model-based clustering that includes plausibility and feasibility constraints.<n>Our approach takes as input the factual, the target cluster, a binary mask indicating actionable or immutable features and a plausibility factor specifying how far from the cluster boundary the counterfactual should be placed.
arXiv Detail & Related papers (2025-01-17T14:56:20Z) - A Semidefinite Programming-Based Branch-and-Cut Algorithm for Biclustering [0.0]
We present a tailored branch-and-cut algorithm for biclustering problems.<n>We show that the proposed algorithm can solve instances 20 times larger than those handled by general-purpose solvers.
arXiv Detail & Related papers (2024-03-17T21:43:19Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.<n>IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Late Fusion Multi-view Clustering via Global and Local Alignment
Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance.
Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering.
We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC.
Background knowledge is incorporated as pairwise must-link and cannot-link constraints.
For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z) - Clustering to the Fewest Clusters Under Intra-Cluster Dissimilarity
Constraints [0.0]
equiwide clustering relies neither on density nor on a predefined number of expected classes, but on a dissimilarity threshold.
We review and evaluate suitable clustering algorithms to identify trade-offs between the various practical solutions for this clustering problem.
arXiv Detail & Related papers (2021-09-28T12:02:18Z) - Mixed Membership Graph Clustering via Systematic Edge Query [22.77704627076251]
This work considers clustering nodes of a largely incomplete graph.
Under the problem setting, only a small amount of queries about the edges can be made, but the entire graph is not observable.
This problem finds applications in large-scale data clustering using limited annotations, community detection under restricted survey resources, and graph topology inference.
arXiv Detail & Related papers (2020-11-25T19:19:05Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.