Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains
- URL: http://arxiv.org/abs/2412.08937v1
- Date: Thu, 12 Dec 2024 04:58:32 GMT
- Title: Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains
- Authors: Yunhui Liu, Qizhuo Xie, Jinwei Shi, Jiaxu Shen, Tieke He,
- Abstract summary: We introduce a collection of challenging and diverse benchmark datasets for realistic and reproducible evaluation of machine learning models on HTAGs.
Our HTAG datasets are multi-scale, span years in duration, and cover a wide range of domains, including movie, community question answering, academic, literature, and patent networks.
All source data, dataset construction codes, processed HTAGs, data loaders, benchmark codes, and evaluation setup are publicly available at GitHub and Hugging Face.
- Score: 25.61868709829681
- License:
- Abstract: Heterogeneous Text-Attributed Graphs (HTAGs), where different types of entities are not only associated with texts but also connected by diverse relationships, have gained widespread popularity and application across various domains. However, current research on text-attributed graph learning predominantly focuses on homogeneous graphs, which feature a single node and edge type, thus leaving a gap in understanding how methods perform on HTAGs. One crucial reason is the lack of comprehensive HTAG datasets that offer original textual content and span multiple domains of varying sizes. To this end, we introduce a collection of challenging and diverse benchmark datasets for realistic and reproducible evaluation of machine learning models on HTAGs. Our HTAG datasets are multi-scale, span years in duration, and cover a wide range of domains, including movie, community question answering, academic, literature, and patent networks. We further conduct benchmark experiments on these datasets with various graph neural networks. All source data, dataset construction codes, processed HTAGs, data loaders, benchmark codes, and evaluation setup are publicly available at GitHub and Hugging Face.
Related papers
- TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models [25.16561980988102]
TAGLAS is an atlas of text-attributed graph (TAG) datasets and benchmarks.
We collect and integrate more than 23 TAG datasets with domains ranging from citation graphs to molecule graphs.
We provide a standardized, efficient, and simplified way to load all datasets and tasks.
arXiv Detail & Related papers (2024-06-20T19:11:35Z) - Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information.
Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z) - DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs [28.340416573162898]
Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios.
Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs.
We introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs.
arXiv Detail & Related papers (2024-06-17T20:16:12Z) - Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights [44.11628188443046]
A Graph Foundation Model (GFM) can work well across different graphs and tasks with a unified backbone.
Inspired by multi-modal models that align different modalities with natural language, the text has recently been adopted to provide a unified feature space for diverse graphs.
Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems.
arXiv Detail & Related papers (2024-06-15T19:56:21Z) - TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs [14.437863803271808]
Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections.
Existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes.
To address this gap, we introduce Textual-Edge Graphs datasets featuring rich textual descriptions on nodes and edges.
arXiv Detail & Related papers (2024-06-14T06:22:47Z) - Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs.
In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations.
We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z) - One for All: Towards Training One Graph Model for All Classification Tasks [61.656962278497225]
A unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain.
We propose textbfOne for All (OFA), the first general framework that can use a single graph model to address the above challenges.
OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.
arXiv Detail & Related papers (2023-09-29T21:15:26Z) - Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer [140.72439827136085]
We propose a graph reasoning and transfer learning framework named "Graphonomy"
It incorporates human knowledge and label taxonomy into the intermediate graph representation learning beyond local convolutions.
It learns the global and structured semantic coherency in multiple domains via semantic-aware graph reasoning and transfer.
arXiv Detail & Related papers (2021-01-26T08:19:03Z) - Cross-Domain Facial Expression Recognition: A Unified Evaluation
Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z) - Open Graph Benchmark: Datasets for Machine Learning on Graphs [86.96887552203479]
We present the Open Graph Benchmark (OGB) to facilitate scalable, robust, and reproducible graph machine learning (ML) research.
OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains.
For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics.
arXiv Detail & Related papers (2020-05-02T03:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.