Efficient High-Quality Clustering for Large Bipartite Graphs
- URL: http://arxiv.org/abs/2312.16926v1
- Date: Thu, 28 Dec 2023 09:50:56 GMT
- Title: Efficient High-Quality Clustering for Large Bipartite Graphs
- Authors: Renchi Yang and Jieming Shi
- Abstract summary: k-Bipartite Graph Clustering (k-BGC) is to partition the target vertices set in a bipartite graph into k disjoint clusters.
The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics.
This paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs.
- Score: 7.533043289759316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A bipartite graph contains inter-set edges between two disjoint vertex sets,
and is widely used to model real-world data, such as user-item purchase
records, author-article publications, and biological interactions between drugs
and proteins. k-Bipartite Graph Clustering (k-BGC) is to partition the target
vertex set in a bipartite graph into k disjoint clusters. The clustering
quality is important to the utility of k-BGC in various applications like
social network analysis, recommendation systems, text mining, and
bioinformatics, to name a few. Existing approaches to k-BGC either output
clustering results with compromised quality due to inadequate exploitation of
high-order information between vertices, or fail to handle sizable bipartite
graphs with billions of edges.
Motivated by this, this paper presents two efficient k-BGC solutions, HOPE
and HOPE+, which achieve state-of-the-art performance on large-scale bipartite
graphs. HOPE obtains high scalability and effectiveness through a new k-BGC
problem formulation based on the novel notion of high-order perspective (HOP)
vectors and an efficient technique for low-rank approximation of HOP vectors.
HOPE+ further elevates the k-BGC performance to another level with a judicious
problem transformation and a highly efficient two-stage optimization framework.
Two variants, HOPE+ (FNEM) and HOPE+ (SNEM) are designed when either the
Frobenius norm or spectral norm is applied in the transformation. Extensive
experiments, comparing HOPE and HOPE+ against 13 competitors on 10 real-world
datasets, exhibit that our solutions, especially HOPE+, are superior to
existing methods in terms of result quality, while being up to orders of
magnitude faster. On the largest dataset MAG with 1.1 billion edges, HOPE+ is
able to produce clusters with the highest clustering accuracy within 31
minutes, which is unmatched by any existing solution for k-BGC.
Related papers
- Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition [56.26113670151363]
Graph condensation is a data-centric solution to replace the large graph with a small yet informative condensed graph.
Existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources.
We propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC)
CGC achieves state-of-the-art performance with a more efficient condensation process.
arXiv Detail & Related papers (2024-05-22T14:57:09Z) - Effective Clustering on Large Attributed Bipartite Graphs [10.701751248623863]
Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes.
Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains.
However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately.
We propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets.
arXiv Detail & Related papers (2024-05-20T09:58:27Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - GLCC: A General Framework for Graph-level Clustering [5.069852282550117]
This paper studies the problem of graph-level clustering, which is a novel yet challenging task.
We propose a general graph-level clustering framework named Graph-Level Contrastive Clustering (GLCC)
Experiments on a range of well-known datasets demonstrate the superiority of our proposed GLCC over competitive baselines.
arXiv Detail & Related papers (2022-10-21T11:08:10Z) - Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with
Heterophily [58.76759997223951]
We propose a new metric based on von Neumann entropy to re-examine the heterophily problem of GNNs.
We also propose a Conv-Agnostic GNN framework (CAGNNs) to enhance the performance of most GNNs on heterophily datasets.
arXiv Detail & Related papers (2022-03-19T14:26:43Z) - Graph Representation Learning via Contrasting Cluster Assignments [57.87743170674533]
We propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA.
It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning.
GRCCA has strong competitiveness in most tasks.
arXiv Detail & Related papers (2021-12-15T07:28:58Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Effective and Scalable Clustering on Massive Attributed Graphs [25.161619807810215]
We propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G.
ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster.
In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.
arXiv Detail & Related papers (2021-02-07T15:50:28Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.