Constrained Hierarchical Clustering via Graph Coarsening and Optimal
Cuts
- URL: http://arxiv.org/abs/2312.04209v1
- Date: Thu, 7 Dec 2023 10:52:06 GMT
- Title: Constrained Hierarchical Clustering via Graph Coarsening and Optimal
Cuts
- Authors: Eliabelle Mauduit and Andrea Simonetto
- Abstract summary: We study the problem of clustering words in a hierarchical fashion.
In particular, we focus on the problem of clustering with horizontal and vertical structural constraints.
We show that the resulting approach compares very well with respect to existing algorithms and is computationally light.
- Score: 1.5591858554014466
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Motivated by extracting and summarizing relevant information in short
sentence settings, such as satisfaction questionnaires, hotel reviews, and
X/Twitter, we study the problem of clustering words in a hierarchical fashion.
In particular, we focus on the problem of clustering with horizontal and
vertical structural constraints. Horizontal constraints are typically
cannot-link and must-link among words, while vertical constraints are
precedence constraints among cluster levels. We overcome state-of-the-art
bottlenecks by formulating the problem in two steps: first, as a
soft-constrained regularized least-squares which guides the result of a
sequential graph coarsening algorithm towards the horizontal feasible set.
Then, flat clusters are extracted from the resulting hierarchical tree by
computing optimal cut heights based on the available constraints. We show that
the resulting approach compares very well with respect to existing algorithms
and is computationally light.
Related papers
- Exact and Heuristic Algorithms for Constrained Biclustering [0.0]
Biclustering, also known as co-clustering or two-way clustering, simultaneously partitions the rows and columns of a data matrix to reveal submatrices with coherent patterns.<n>We study constrained biclustering with pairwise constraints, namely must-link and cannot-link constraints, which specify whether objects should belong to the same or different biclusters.
arXiv Detail & Related papers (2025-08-07T15:29:22Z) - A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems [24.208152437317768]
We view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset.
We propose a block coordinate descent algorithm to efficiently solve this model.
We can construct the clusters that theoretically meet the must-link constraints under mild assumptions.
arXiv Detail & Related papers (2025-03-06T14:02:28Z) - Algorithms for the preordering problem and their application to the task of jointly clustering and ordering the accounts of a social network [8.436874393402158]
The NP-hard maximum value preordering problem is a joint relaxation and a hybrid of the clique partition problem (a clustering problem) and the partial ordering problem.<n>We introduce a linear-time 4-approximation algorithm that constructs a maximum dicut of a subgraph and define local searchs.<n>We apply these algorithms to the task of jointly clustering and partially ordering the accounts of published social networks, and compare the output and efficiency qualitatively and quantitatively.
arXiv Detail & Related papers (2025-02-20T13:12:03Z) - An SDP-based Branch-and-Cut Algorithm for Biclustering [0.0]
We present a tailored branch-and-cut algorithm for biclustering problems.
We show that the proposed algorithm can solve instances 20 times larger than those handled by general-purpose solvers.
arXiv Detail & Related papers (2024-03-17T21:43:19Z) - MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity
and Degree Descent Criterion [0.6906005491572401]
spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability.
We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition.
The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
arXiv Detail & Related papers (2023-12-07T06:19:39Z) - Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification.
By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - Reinforcement Learning Based Query Vertex Ordering Model for Subgraph
Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G.
matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms.
In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Faster Deterministic Approximation Algorithms for Correlation Clustering
and Cluster Deletion [5.584060970507507]
Correlation clustering is a framework for partitioning datasets based on pairwise similarity and dissimilarity scores.
In this paper we prove new relationships between correlation clustering problems and edge labeling problems related to the principle of strong partitioningic closure.
We develop new approximation algorithms for correlation clustering that have deterministic constant factor approximation guarantees and avoid the canonical linear programming relaxation.
arXiv Detail & Related papers (2021-11-20T22:47:19Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.