Related papers: Efficient High-Quality Clustering for Large Bipartite Graphs

Efficient High-Quality Clustering for Large Bipartite Graphs

URL: http://arxiv.org/abs/2312.16926v1
Date: Thu, 28 Dec 2023 09:50:56 GMT
Title: Efficient High-Quality Clustering for Large Bipartite Graphs
Authors: Renchi Yang and Jieming Shi
Abstract summary: k-Bipartite Graph Clustering (k-BGC) is to partition the target vertices set in a bipartite graph into k disjoint clusters. The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics. This paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs.
Score: 7.533043289759316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A bipartite graph contains inter-set edges between two disjoint vertex sets, and is widely used to model real-world data, such as user-item purchase records, author-article publications, and biological interactions between drugs and proteins. k-Bipartite Graph Clustering (k-BGC) is to partition the target vertex set in a bipartite graph into k disjoint clusters. The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics, to name a few. Existing approaches to k-BGC either output clustering results with compromised quality due to inadequate exploitation of high-order information between vertices, or fail to handle sizable bipartite graphs with billions of edges. Motivated by this, this paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs. HOPE obtains high scalability and effectiveness through a new k-BGC problem formulation based on the novel notion of high-order perspective (HOP) vectors and an efficient technique for low-rank approximation of HOP vectors. HOPE+ further elevates the k-BGC performance to another level with a judicious problem transformation and a highly efficient two-stage optimization framework. Two variants, HOPE+ (FNEM) and HOPE+ (SNEM) are designed when either the Frobenius norm or spectral norm is applied in the transformation. Extensive experiments, comparing HOPE and HOPE+ against 13 competitors on 10 real-world datasets, exhibit that our solutions, especially HOPE+, are superior to existing methods in terms of result quality, while being up to orders of magnitude faster. On the largest dataset MAG with 1.1 billion edges, HOPE+ is able to produce clusters with the highest clustering accuracy within 31 minutes, which is unmatched by any existing solution for k-BGC.

Related papers

A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
Spectral Subspace Clustering for Attributed Graphs [3.974852803981998]
Subspace clustering seeks to identify subspaces that segment a set of n data points into k (kn) groups. This paper presents two effective and efficient algorithms, S2CAG and M-S2CAG, for SCAG computation.
arXiv Detail & Related papers (2024-11-17T13:22:15Z)
Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition [56.26113670151363]
Graph condensation is a data-centric solution to replace the large graph with a small yet informative condensed graph. Existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources. We propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC) CGC achieves state-of-the-art performance with a more efficient condensation process.
arXiv Detail & Related papers (2024-05-22T14:57:09Z)
Effective Clustering on Large Attributed Bipartite Graphs [10.701751248623863]
Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains. However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately. We propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets.
arXiv Detail & Related papers (2024-05-20T09:58:27Z)
Cluster-based Graph Collaborative Filtering [55.929052969825825]
Graph Convolution Networks (GCNs) have succeeded in learning user and item representations for recommendation systems. Most existing GCN-based methods overlook the multiple interests of users while performing high-order graph convolution. We propose a novel GCN-based recommendation model, termed Cluster-based Graph Collaborative Filtering (ClusterGCF)
arXiv Detail & Related papers (2024-04-16T07:05:16Z)
GLCC: A General Framework for Graph-level Clustering [5.069852282550117]
This paper studies the problem of graph-level clustering, which is a novel yet challenging task. We propose a general graph-level clustering framework named Graph-Level Contrastive Clustering (GLCC) Experiments on a range of well-known datasets demonstrate the superiority of our proposed GLCC over competitive baselines.
arXiv Detail & Related papers (2022-10-21T11:08:10Z)
Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with Heterophily [58.76759997223951]
We propose a new metric based on von Neumann entropy to re-examine the heterophily problem of GNNs. We also propose a Conv-Agnostic GNN framework (CAGNNs) to enhance the performance of most GNNs on heterophily datasets.
arXiv Detail & Related papers (2022-03-19T14:26:43Z)
Graph Representation Learning via Contrasting Cluster Assignments [57.87743170674533]
We propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA. It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning. GRCCA has strong competitiveness in most tasks.
arXiv Detail & Related papers (2021-12-15T07:28:58Z)
Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering. The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z)
Effective and Scalable Clustering on Massive Attributed Graphs [25.161619807810215]
We propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G. ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster. In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.
arXiv Detail & Related papers (2021-02-07T15:50:28Z)
Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.