Related papers: Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains

Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains

URL: http://arxiv.org/abs/2412.08937v1
Date: Thu, 12 Dec 2024 04:58:32 GMT
Title: Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains
Authors: Yunhui Liu, Qizhuo Xie, Jinwei Shi, Jiaxu Shen, Tieke He,
Abstract summary: We introduce a collection of challenging and diverse benchmark datasets for realistic and reproducible evaluation of machine learning models on HTAGs.<n>Our HTAG datasets are multi-scale, span years in duration, and cover a wide range of domains, including movie, community question answering, academic, literature, and patent networks.<n>All source data, dataset construction codes, processed HTAGs, data loaders, benchmark codes, and evaluation setup are publicly available at GitHub and Hugging Face.
Score: 25.61868709829681
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Heterogeneous Text-Attributed Graphs (HTAGs), where different types of entities are not only associated with texts but also connected by diverse relationships, have gained widespread popularity and application across various domains. However, current research on text-attributed graph learning predominantly focuses on homogeneous graphs, which feature a single node and edge type, thus leaving a gap in understanding how methods perform on HTAGs. One crucial reason is the lack of comprehensive HTAG datasets that offer original textual content and span multiple domains of varying sizes. To this end, we introduce a collection of challenging and diverse benchmark datasets for realistic and reproducible evaluation of machine learning models on HTAGs. Our HTAG datasets are multi-scale, span years in duration, and cover a wide range of domains, including movie, community question answering, academic, literature, and patent networks. We further conduct benchmark experiments on these datasets with various graph neural networks. All source data, dataset construction codes, processed HTAGs, data loaders, benchmark codes, and evaluation setup are publicly available at GitHub and Hugging Face.

Related papers

H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs [6.601515580215021]
We introduce H$2$GFM, a novel framework designed to generalize across both HoTAGs and HeTAGs.<n>Our model projects diverse meta-relations among graphs under a unified textual space.<n>We employ a mixture of CGT experts to capture the heterogeneity in structural patterns among graph types.
arXiv Detail & Related papers (2025-06-10T00:03:56Z)
HeTGB: A Comprehensive Benchmark for Heterophilic Text-Attributed Graphs [38.79574338268996]
Graph neural networks (GNNs) have demonstrated success in modeling relational data under the assumption of homophily. Many real-world graphs exhibit heterophily, where linked nodes belong to different categories or possess diverse attributes. We introduce the Heterophilic Text-attributed Graph Benchmark (HeTGB), a novel benchmark comprising five real-world heterophilic graph datasets from diverse domains.
arXiv Detail & Related papers (2025-03-05T02:00:32Z)
TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models [25.16561980988102]
TAGLAS is an atlas of text-attributed graph (TAG) datasets and benchmarks. We collect and integrate more than 23 TAG datasets with domains ranging from citation graphs to molecule graphs. We provide a standardized, efficient, and simplified way to load all datasets and tasks.
arXiv Detail & Related papers (2024-06-20T19:11:35Z)
Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information. Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z)
DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs [28.340416573162898]
Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs. We introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs.
arXiv Detail & Related papers (2024-06-17T20:16:12Z)
TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs [14.437863803271808]
Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections. Existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. To address this gap, we introduce Textual-Edge Graphs datasets featuring rich textual descriptions on nodes and edges.
arXiv Detail & Related papers (2024-06-14T06:22:47Z)
Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs. In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations. We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z)
One for All: Towards Training One Graph Model for All Classification Tasks [61.656962278497225]
A unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain. We propose textbfOne for All (OFA), the first general framework that can use a single graph model to address the above challenges. OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.
arXiv Detail & Related papers (2023-09-29T21:15:26Z)
Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer [140.72439827136085]
We propose a graph reasoning and transfer learning framework named "Graphonomy" It incorporates human knowledge and label taxonomy into the intermediate graph representation learning beyond local convolutions. It learns the global and structured semantic coherency in multiple domains via semantic-aware graph reasoning and transfer.
arXiv Detail & Related papers (2021-01-26T08:19:03Z)
Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation. We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z)
Open Graph Benchmark: Datasets for Machine Learning on Graphs [86.96887552203479]
We present the Open Graph Benchmark (OGB) to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics.
arXiv Detail & Related papers (2020-05-02T03:09:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.