Effective Clustering on Large Attributed Bipartite Graphs
- URL: http://arxiv.org/abs/2405.11922v1
- Date: Mon, 20 May 2024 09:58:27 GMT
- Title: Effective Clustering on Large Attributed Bipartite Graphs
- Authors: Renchi Yang, Yidu Wu, Xiaoyang Lin, Qichen Wang, Tsz Nam Chan, Jieming Shi,
- Abstract summary: Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes.
Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains.
However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately.
We propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets.
- Score: 10.701751248623863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes that are associated with rich attributes, such as customer-product purchase networks and author-paper authorship graphs. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains, including social network analysis, recommendation systems, information retrieval, and bioinformatics. However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately, engendering severely compromised result quality. The severity of these issues is accentuated in real ABGs, which often encompass millions of nodes and a sheer volume of attribute data, rendering effective k-ABGC over such graphs highly challenging. In this paper, we propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets. TPO obtains high clustering quality through two major contributions: (i) a novel formulation and transformation of the k-ABGC problem based on multi-scale attribute affinity specialized for capturing attribute affinities between nodes with the consideration of their multi-hop connections in ABGs, and (ii) a highly efficient solver that includes a suite of carefully-crafted optimizations for sidestepping explicit affinity matrix construction and facilitating faster convergence. Extensive experiments, comparing TPO against 19 baselines over 5 real ABGs, showcase the superior clustering quality of TPO measured against ground-truth labels. Moreover, compared to the state of the arts, TPO is often more than 40x faster over both small and large ABGs.
Related papers
- Enhancing Missing Data Imputation through Combined Bipartite Graph and Complete Directed Graph [18.06658040186476]
We introduce a novel framework named the Bipartite and Complete Directed Graph Neural Network (BCGNN)
Within BCGNN, observations and features are differentiated as two distinct node types, and the values of observed features are converted into attributed edges linking them.
In parallel, the complete directed graph segment adeptly outlines and communicates the complex interdependencies among features.
arXiv Detail & Related papers (2024-11-07T17:48:37Z) - A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation [14.327262299413789]
We develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and attributed hypergraph clustering (AHC)
We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.
arXiv Detail & Related papers (2024-08-10T06:59:51Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Efficient High-Quality Clustering for Large Bipartite Graphs [7.533043289759316]
k-Bipartite Graph Clustering (k-BGC) is to partition the target vertices set in a bipartite graph into k disjoint clusters.
The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics.
This paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs.
arXiv Detail & Related papers (2023-12-28T09:50:56Z) - Efficient Multi-View Graph Clustering with Local and Global Structure
Preservation [59.49018175496533]
We propose a novel anchor-based multi-view graph clustering framework termed Efficient Multi-View Graph Clustering with Local and Global Structure Preservation (EMVGC-LG)
Specifically, EMVGC-LG jointly optimize anchor construction and graph learning to enhance the clustering quality.
In addition, EMVGC-LG inherits the linear complexity of existing AMVGC methods respecting the sample number.
arXiv Detail & Related papers (2023-08-31T12:12:30Z) - Graph Representation Learning via Contrasting Cluster Assignments [57.87743170674533]
We propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA.
It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning.
GRCCA has strong competitiveness in most tasks.
arXiv Detail & Related papers (2021-12-15T07:28:58Z) - Attention-driven Graph Clustering Network [49.040136530379094]
We propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN)
AGCN exploits a heterogeneous-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature.
AGCN can jointly perform feature learning and cluster assignment in an unsupervised fashion.
arXiv Detail & Related papers (2021-08-12T02:30:38Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Effective and Scalable Clustering on Massive Attributed Graphs [25.161619807810215]
We propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G.
ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster.
In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.
arXiv Detail & Related papers (2021-02-07T15:50:28Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.