arXiv4TGC: Large-Scale Datasets for Temporal Graph Clustering
- URL: http://arxiv.org/abs/2306.04962v1
- Date: Thu, 8 Jun 2023 06:37:04 GMT
- Title: arXiv4TGC: Large-Scale Datasets for Temporal Graph Clustering
- Authors: Meng Liu, Ke Liang, Yue Liu, Siwei Wang, Sihang Zhou, Xinwang Liu
- Abstract summary: We build arXiv4TGC, a set of novel academic datasets for temporal graph clustering.
In particular, the largest dataset, arXivLarge, contains 1.3 million labeled available nodes and 10 million temporal edges.
The clustering performance on arXiv4TGC can be more apparent for evaluating different models.
- Score: 52.63652741011945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal graph clustering (TGC) is a crucial task in temporal graph learning.
Its focus is on node clustering on temporal graphs, and it offers greater
flexibility for large-scale graph structures due to the mechanism of temporal
graph methods. However, the development of TGC is currently constrained by a
significant problem: the lack of suitable and reliable large-scale temporal
graph datasets to evaluate clustering performance. In other words, most
existing temporal graph datasets are in small sizes, and even large-scale
datasets contain only a limited number of available node labels. It makes
evaluating models for large-scale temporal graph clustering challenging. To
address this challenge, we build arXiv4TGC, a set of novel academic datasets
(including arXivAI, arXivCS, arXivMath, arXivPhy, and arXivLarge) for
large-scale temporal graph clustering. In particular, the largest dataset,
arXivLarge, contains 1.3 million labeled available nodes and 10 million
temporal edges. We further compare the clustering performance with typical
temporal graph learning models on both previous classic temporal graph datasets
and the new datasets proposed in this paper. The clustering performance on
arXiv4TGC can be more apparent for evaluating different models, resulting in
higher clustering confidence and more suitable for large-scale temporal graph
clustering. The arXiv4TGC datasets are publicly available at:
https://github.com/MGitHubL/arXiv4TGC.
Related papers
- Spectral Greedy Coresets for Graph Neural Networks [61.24300262316091]
The ubiquity of large-scale graphs in node-classification tasks hinders the real-world applications of Graph Neural Networks (GNNs)
This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs based on their spectral embeddings.
Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs.
arXiv Detail & Related papers (2024-05-27T17:52:12Z) - LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering [59.89626219328127]
Graph clustering is a fundamental problem in machine learning.
Deep learning methods achieve the state-of-the-art results in recent years, but they still cannot work without predefined cluster numbers.
We propose to address this problem from a fresh perspective of graph information theory.
arXiv Detail & Related papers (2024-05-20T05:46:41Z) - Deep Temporal Graph Clustering [77.02070768950145]
We propose a general framework for deep Temporal Graph Clustering (GC)
GC introduces deep clustering techniques to suit the interaction sequence-based batch-processing pattern of temporal graphs.
Our framework can effectively improve the performance of existing temporal graph learning methods.
arXiv Detail & Related papers (2023-05-18T06:17:50Z) - Clustering of Time-Varying Graphs Based on Temporal Label Smoothness [28.025212175496964]
We propose a node clustering method for time-varying graphs based on the assumption that the cluster labels are changed smoothly over time.
Experiments on synthetic and real-world time-varying graphs are performed to validate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-05-11T05:20:41Z) - Simplified Graph Convolution with Heterophily [25.7577503312319]
We show that Simple Graph Convolution (SGC) is ineffective for heterophilous (i.e., non-homophilous) graphs.
We propose Adaptive Simple Graph Convolution (ASGC), which we show can adapt to both homophilous and heterophilous graph structure.
arXiv Detail & Related papers (2022-02-08T20:52:08Z) - AnchorGAE: General Data Clustering via $O(n)$ Bipartite Graph
Convolution [79.44066256794187]
We show how to convert a non-graph dataset into a graph by introducing the generative graph model, which is used to build graph convolution networks (GCNs)
A bipartite graph constructed by anchors is updated dynamically to exploit the high-level information behind data.
We theoretically prove that the simple update will lead to degeneration and a specific strategy is accordingly designed.
arXiv Detail & Related papers (2021-11-12T07:08:13Z) - Weighted Graph Nodes Clustering via Gumbel Softmax [0.0]
We present some ongoing research results on graph clustering algorithms for clustering weighted graph datasets.
We name our algorithm as Weighted Graph Node Clustering via Gumbel Softmax (WGCGS)
arXiv Detail & Related papers (2021-02-22T05:05:35Z) - Adaptive Graph Auto-Encoder for General Data Clustering [90.8576971748142]
Graph-based clustering plays an important role in the clustering area.
Recent studies about graph convolution neural networks have achieved impressive success on graph type data.
We propose a graph auto-encoder for general data clustering, which constructs the graph adaptively according to the generative perspective of graphs.
arXiv Detail & Related papers (2020-02-20T10:11:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.