CITE: A Comprehensive Benchmark for Heterogeneous Text-Attributed Graphs on Catalytic Materials
- URL: http://arxiv.org/abs/2508.15392v1
- Date: Thu, 21 Aug 2025 09:28:19 GMT
- Title: CITE: A Comprehensive Benchmark for Heterogeneous Text-Attributed Graphs on Catalytic Materials
- Authors: Chenghao Zhang, Qingqing Long, Ludi Wang, Wenjuan Cui, Jianjun Yu, Yi Du,
- Abstract summary: We introduce CITE, the first and largest heterogeneous text-attributed citation graph benchmark for catalytic materials.<n>CITE comprises over 438K nodes and 1.2M edges, spanning four relation types.<n>We compare four classes of learning paradigms, including homogeneous graph models, heterogeneous graph models, LLM(Large Language Model)-centric models, and LLM+Graph models.
- Score: 7.08105954189442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-attributed graphs(TAGs) are pervasive in real-world systems,where each node carries its own textual features. In many cases these graphs are inherently heterogeneous, containing multiple node types and diverse edge types. Despite the ubiquity of such heterogeneous TAGs, there remains a lack of large-scale benchmark datasets. This shortage has become a critical bottleneck, hindering the development and fair comparison of representation learning methods on heterogeneous text-attributed graphs. In this paper, we introduce CITE - Catalytic Information Textual Entities Graph, the first and largest heterogeneous text-attributed citation graph benchmark for catalytic materials. CITE comprises over 438K nodes and 1.2M edges, spanning four relation types. In addition, we establish standardized evaluation procedures and conduct extensive benchmarking on the node classification task, as well as ablation experiments on the heterogeneous and textual properties of CITE. We compare four classes of learning paradigms, including homogeneous graph models, heterogeneous graph models, LLM(Large Language Model)-centric models, and LLM+Graph models. In a nutshell, we provide (i) an overview of the CITE dataset, (ii) standardized evaluation protocols, and (iii) baseline and ablation experiments across diverse modeling paradigms.
Related papers
- H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs [6.601515580215021]
We introduce H$2$GFM, a novel framework designed to generalize across both HoTAGs and HeTAGs.<n>Our model projects diverse meta-relations among graphs under a unified textual space.<n>We employ a mixture of CGT experts to capture the heterogeneity in structural patterns among graph types.
arXiv Detail & Related papers (2025-06-10T00:03:56Z) - HeTGB: A Comprehensive Benchmark for Heterophilic Text-Attributed Graphs [38.79574338268996]
Graph neural networks (GNNs) have demonstrated success in modeling relational data under the assumption of homophily.<n>Many real-world graphs exhibit heterophily, where linked nodes belong to different categories or possess diverse attributes.<n>We introduce the Heterophilic Text-attributed Graph Benchmark (HeTGB), a novel benchmark comprising five real-world heterophilic graph datasets from diverse domains.
arXiv Detail & Related papers (2025-03-05T02:00:32Z) - When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark [18.253578434782103]
We introduce H2GB, a large-scale node-classification graph benchmark.<n>It brings together the complexities of both the Heterophily and Heterophily properties of real-world graphs.<n>We also present a new variant of the model, H2G-former, that excels at this challenging benchmark.
arXiv Detail & Related papers (2024-07-15T17:18:42Z) - The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges [101.83124435649358]
Homophily principle, ie nodes with the same labels or similar attributes are more likely to be connected.
Recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory.
arXiv Detail & Related papers (2024-07-12T18:04:32Z) - A Survey on Learning from Graphs with Heterophily: Recent Advances and Future Directions [35.544281678888986]
Heterophilic graphs, where linked nodes are prone to be with different labels or dissimilar features, have recently attracted significant attention.
Various graph heterophily measures, benchmark datasets, and learning paradigms are emerging rapidly.
arXiv Detail & Related papers (2024-01-18T07:36:38Z) - Geometry Contrastive Learning on Heterogeneous Graphs [50.58523799455101]
This paper proposes a novel self-supervised learning method, termed as Geometry Contrastive Learning (GCL)
GCL views a heterogeneous graph from Euclidean and hyperbolic perspective simultaneously, aiming to make a strong merger of the ability of modeling rich semantics and complex structures.
Extensive experiments on four benchmarks data sets show that the proposed approach outperforms the strong baselines.
arXiv Detail & Related papers (2022-06-25T03:54:53Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.