Efficient High-Quality Clustering for Large Bipartite Graphs
- URL: http://arxiv.org/abs/2312.16926v1
- Date: Thu, 28 Dec 2023 09:50:56 GMT
- Title: Efficient High-Quality Clustering for Large Bipartite Graphs
- Authors: Renchi Yang and Jieming Shi
- Abstract summary: k-Bipartite Graph Clustering (k-BGC) is to partition the target vertices set in a bipartite graph into k disjoint clusters.
The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics.
This paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs.
- Score: 7.533043289759316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A bipartite graph contains inter-set edges between two disjoint vertex sets,
and is widely used to model real-world data, such as user-item purchase
records, author-article publications, and biological interactions between drugs
and proteins. k-Bipartite Graph Clustering (k-BGC) is to partition the target
vertex set in a bipartite graph into k disjoint clusters. The clustering
quality is important to the utility of k-BGC in various applications like
social network analysis, recommendation systems, text mining, and
bioinformatics, to name a few. Existing approaches to k-BGC either output
clustering results with compromised quality due to inadequate exploitation of
high-order information between vertices, or fail to handle sizable bipartite
graphs with billions of edges.
Motivated by this, this paper presents two efficient k-BGC solutions, HOPE
and HOPE+, which achieve state-of-the-art performance on large-scale bipartite
graphs. HOPE obtains high scalability and effectiveness through a new k-BGC
problem formulation based on the novel notion of high-order perspective (HOP)
vectors and an efficient technique for low-rank approximation of HOP vectors.
HOPE+ further elevates the k-BGC performance to another level with a judicious
problem transformation and a highly efficient two-stage optimization framework.
Two variants, HOPE+ (FNEM) and HOPE+ (SNEM) are designed when either the
Frobenius norm or spectral norm is applied in the transformation. Extensive
experiments, comparing HOPE and HOPE+ against 13 competitors on 10 real-world
datasets, exhibit that our solutions, especially HOPE+, are superior to
existing methods in terms of result quality, while being up to orders of
magnitude faster. On the largest dataset MAG with 1.1 billion edges, HOPE+ is
able to produce clusters with the highest clustering accuracy within 31
minutes, which is unmatched by any existing solution for k-BGC.
Related papers
- A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC.
It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters.
GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z) - Spectral Subspace Clustering for Attributed Graphs [3.974852803981998]
Subspace clustering seeks to identify subspaces that segment a set of n data points into k (kn) groups.
This paper presents two effective and efficient algorithms, S2CAG and M-S2CAG, for SCAG computation.
arXiv Detail & Related papers (2024-11-17T13:22:15Z) - Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition [49.41718583061147]
Graph condensation is a data-centric solution to replace the large graph with a small yet informative condensed graph.
Existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources and training time.
We propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC)
CGC condenses the Ogbn-products graph within 30 seconds, achieving a speedup ranging from $102$X to $104$X and increasing accuracy by up to 4.2%.
arXiv Detail & Related papers (2024-05-22T14:57:09Z) - Effective Clustering on Large Attributed Bipartite Graphs [10.701751248623863]
Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes.
Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains.
However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately.
We propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets.
arXiv Detail & Related papers (2024-05-20T09:58:27Z) - Cluster-based Graph Collaborative Filtering [55.929052969825825]
Graph Convolution Networks (GCNs) have succeeded in learning user and item representations for recommendation systems.
Most existing GCN-based methods overlook the multiple interests of users while performing high-order graph convolution.
We propose a novel GCN-based recommendation model, termed Cluster-based Graph Collaborative Filtering (ClusterGCF)
arXiv Detail & Related papers (2024-04-16T07:05:16Z) - GLCC: A General Framework for Graph-level Clustering [5.069852282550117]
This paper studies the problem of graph-level clustering, which is a novel yet challenging task.
We propose a general graph-level clustering framework named Graph-Level Contrastive Clustering (GLCC)
Experiments on a range of well-known datasets demonstrate the superiority of our proposed GLCC over competitive baselines.
arXiv Detail & Related papers (2022-10-21T11:08:10Z) - Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with
Heterophily [58.76759997223951]
We propose a new metric based on von Neumann entropy to re-examine the heterophily problem of GNNs.
We also propose a Conv-Agnostic GNN framework (CAGNNs) to enhance the performance of most GNNs on heterophily datasets.
arXiv Detail & Related papers (2022-03-19T14:26:43Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Effective and Scalable Clustering on Massive Attributed Graphs [25.161619807810215]
We propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G.
ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster.
In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.
arXiv Detail & Related papers (2021-02-07T15:50:28Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.