When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark
- URL: http://arxiv.org/abs/2407.10916v2
- Date: Tue, 03 Jun 2025 03:01:36 GMT
- Title: When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark
- Authors: Junhong Lin, Xiaojie Guo, Shuaicheng Zhang, Yada Zhu, Julian Shun,
- Abstract summary: We introduce H2GB, a large-scale node-classification graph benchmark.<n>It brings together the complexities of both the Heterophily and Heterophily properties of real-world graphs.<n>We also present a new variant of the model, H2G-former, that excels at this challenging benchmark.
- Score: 18.253578434782103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph mining has become crucial in fields such as social science, finance, and cybersecurity. Many large-scale real-world networks exhibit both heterogeneity, where multiple node and edge types exist in the graph, and heterophily, where connected nodes may have dissimilar labels and attributes. However, existing benchmarks primarily focus on either heterophilic homogeneous graphs or homophilic heterogeneous graphs, leaving a significant gap in understanding how models perform on graphs with both heterogeneity and heterophily. To bridge this gap, we introduce H2GB, a large-scale node-classification graph benchmark that brings together the complexities of both the heterophily and heterogeneity properties of real-world graphs. H2GB encompasses 9 real-world datasets spanning 5 diverse domains, 28 baseline models, and a unified benchmarking library with a standardized data loader, evaluator, unified modeling framework, and an extensible framework for reproducibility. We establish a standardized workflow supporting both model selection and development, enabling researchers to easily benchmark graph learning methods. Extensive experiments across 28 baselines reveal that current methods struggle with heterophilic and heterogeneous graphs, underscoring the need for improved approaches. Finally, we present a new variant of the model, H2G-former, developed following our standardized workflow, that excels at this challenging benchmark. Both the benchmark and the framework are publicly available at Github and PyPI, with documentation hosted at https://junhongmit.github.io/H2GB.
Related papers
- H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs [6.601515580215021]
We introduce H$2$GFM, a novel framework designed to generalize across both HoTAGs and HeTAGs.<n>Our model projects diverse meta-relations among graphs under a unified textual space.<n>We employ a mixture of CGT experts to capture the heterogeneity in structural patterns among graph types.
arXiv Detail & Related papers (2025-06-10T00:03:56Z) - Scale Invariance of Graph Neural Networks [4.002604752467421]
We address two fundamental challenges in Graph Neural Networks (GNNs)<n>We propose ScaleNet, a unified network architecture that achieves state-of-the-art performance across four homophilic and two heterophilic benchmark datasets.<n>For another popular GNN approach to digraphs, we demonstrate the equivalence between Hermitian Laplacian methods and GraphSAGE with incidence normalization.
arXiv Detail & Related papers (2024-11-28T22:06:06Z) - Addressing Graph Heterogeneity and Heterophily from A Spectral Perspective [46.37860909753809]
Heterogeneity refers to a graph with multiple types of nodes or edges, while heterophily refers to the fact that connected nodes are more likely to have dissimilar attributes or labels.
We propose a Heterogeneous Heterophilic Spectral Graph Neural Network (H2SGNN), which employs two modules: local independent filtering and global hybrid filtering.
Extensive experiments are conducted on four datasets to validate the effectiveness of the proposed H2SGNN.
arXiv Detail & Related papers (2024-10-17T09:23:53Z) - When Heterophily Meets Heterogeneous Graphs: Latent Graphs Guided Unsupervised Representation Learning [6.2167203720326025]
Unsupervised heterogeneous graph representation learning (UHGRL) has gained increasing attention due to its significance in handling practical graphs without labels.
We define semantic heterophily and propose an innovative framework called Latent Graphs Guided Unsupervised Representation Learning (LatGRL) to handle this problem.
arXiv Detail & Related papers (2024-09-01T10:25:06Z) - The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges [101.83124435649358]
Homophily principle, ie nodes with the same labels or similar attributes are more likely to be connected.
Recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory.
arXiv Detail & Related papers (2024-07-12T18:04:32Z) - HeteGraph-Mamba: Heterogeneous Graph Learning via Selective State Space Model [4.679586996508103]
We propose a heterogeneous graph mamba network (HGMN) as the first exploration in leveraging the selective state space models (SSSMs) for heterogeneous graph learning.
Compared with the literature, our HGMN overcomes two major challenges: (i) capturing long-range dependencies among heterogeneous nodes and (ii) adapting SSSMs to heterogeneous graph data.
arXiv Detail & Related papers (2024-05-22T18:41:11Z) - Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs.
We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon.
To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z) - Hetero$^2$Net: Heterophily-aware Representation Learning on
Heterogenerous Graphs [38.858702539146385]
We present Hetero$2$Net, a heterophily-aware HGNN that incorporates both masked metapath prediction and masked label prediction tasks.
We evaluate the performance of Hetero$2$Net on five real-world heterogeneous graph benchmarks with varying levels of heterophily.
arXiv Detail & Related papers (2023-10-18T02:19:12Z) - Histopathology Whole Slide Image Analysis with Heterogeneous Graph
Representation Learning [78.49090351193269]
We propose a novel graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis.
Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic attribute similarity to each edge.
Our framework outperforms the state-of-the-art methods with considerable margins on various tasks.
arXiv Detail & Related papers (2023-07-09T14:43:40Z) - Hybrid Graph: A Unified Graph Representation with Datasets and
Benchmarks for Complex Graphs [27.24150788635981]
We introduce the concept of hybrid graphs and present the Hybrid Graph Benchmark (HGB)
HGB contains 23 real-world hybrid graph datasets across various domains such as biology, social media, and e-commerce.
We provide an evaluation framework and a supporting framework to facilitate the training and evaluation of Graph Neural Networks (GNNs) on HGB.
arXiv Detail & Related papers (2023-06-08T11:15:34Z) - Beyond Homophily: Reconstructing Structure for Graph-agnostic Clustering [15.764819403555512]
It is impossible to first identify a graph as homophilic or heterophilic before a suitable GNN model can be found.
We propose a novel graph clustering method, which contains three key components: graph reconstruction, a mixed filter, and dual graph clustering network.
Our method dominates others on heterophilic graphs.
arXiv Detail & Related papers (2023-05-03T01:49:01Z) - Single-Pass Contrastive Learning Can Work for Both Homophilic and
Heterophilic Graph [60.28340453547902]
Graph contrastive learning (GCL) techniques typically require two forward passes for a single instance to construct the contrastive loss.
Existing GCL approaches fail to provide strong performance guarantees.
We implement the Single-Pass Graph Contrastive Learning method (SP-GCL)
Empirically, the features learned by the SP-GCL can match or outperform existing strong baselines with significantly less computational overhead.
arXiv Detail & Related papers (2022-11-20T07:18:56Z) - Geometry Contrastive Learning on Heterogeneous Graphs [50.58523799455101]
This paper proposes a novel self-supervised learning method, termed as Geometry Contrastive Learning (GCL)
GCL views a heterogeneous graph from Euclidean and hyperbolic perspective simultaneously, aiming to make a strong merger of the ability of modeling rich semantics and complex structures.
Extensive experiments on four benchmarks data sets show that the proposed approach outperforms the strong baselines.
arXiv Detail & Related papers (2022-06-25T03:54:53Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - Simplified Graph Convolution with Heterophily [25.7577503312319]
We show that Simple Graph Convolution (SGC) is ineffective for heterophilous (i.e., non-homophilous) graphs.
We propose Adaptive Simple Graph Convolution (ASGC), which we show can adapt to both homophilous and heterophilous graph structure.
arXiv Detail & Related papers (2022-02-08T20:52:08Z) - Heterogeneous Graph Transformer [49.675064816860505]
Heterogeneous Graph Transformer (HGT) architecture for modeling Web-scale heterogeneous graphs.
To handle dynamic heterogeneous graphs, we introduce the relative temporal encoding technique into HGT.
To handle Web-scale graph data, we design the heterogeneous mini-batch graph sampling algorithm---HGSampling---for efficient and scalable training.
arXiv Detail & Related papers (2020-03-03T04:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.