Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning
- URL: http://arxiv.org/abs/2406.16321v2
- Date: Sun, 30 Mar 2025 06:11:30 GMT
- Title: Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning
- Authors: Jing Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra,
- Abstract summary: We introduce the Multimodal Graph Benchmark (MM-GRAPH), a pioneering benchmark that incorporates both visual and textual information into graph learning tasks.<n>MM-GRAPH extends beyond existing text-attributed graph benchmarks, offering a more comprehensive evaluation framework for multimodal graph learning.<n>This study offers valuable insights into the challenges and opportunities of integrating visual data into graph learning.
- Score: 36.75510196380185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph machine learning has made significant strides in recent years, yet the integration of visual information with graph structure and its potential for improving performance in downstream tasks remains an underexplored area. To address this critical gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), a pioneering benchmark that incorporates both visual and textual information into graph learning tasks. MM-GRAPH extends beyond existing text-attributed graph benchmarks, offering a more comprehensive evaluation framework for multimodal graph learning Our benchmark comprises seven diverse datasets of varying scales (ranging from thousands to millions of edges), designed to assess algorithms across different tasks in real-world scenarios. These datasets feature rich multimodal node attributes, including visual data, which enables a more holistic evaluation of various graph learning frameworks in complex, multimodal environments. To support advancements in this emerging field, we provide an extensive empirical study on various graph learning frameworks when presented with features from multiple modalities, particularly emphasizing the impact of visual information. This study offers valuable insights into the challenges and opportunities of integrating visual data into graph learning.
Related papers
- UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs [34.48393396390799]
We propose a novel cross-domain graph foundation model that enables general representation learning on multimodal graphs.
UniGraph2 employs modality-specific encoders alongside a graph neural network (GNN) to learn a unified low-dimensional embedding space.
We show that UniGraph2 significantly outperforms state-of-the-art models in tasks such as representation learning, transfer learning, and multimodal generative tasks.
arXiv Detail & Related papers (2025-02-02T14:04:53Z) - When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning [36.6581535146878]
Multimodal attributed graphs (MAGs) are prevalent in various real-world scenarios and generally contain two kinds of knowledge.
Recent advancements in Pre-trained Language/Vision models (PLMs/PVMs) and Graph neural networks (GNNs) have facilitated effective learning on MAGs.
We propose Multimodal Attribute Graph Benchmark (MAGB), a comprehensive and diverse collection of challenging benchmark datasets for MAGs.
arXiv Detail & Related papers (2024-10-11T13:24:57Z) - Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies [7.067145619709089]
This study investigates the impact of graph visualisations on Large Language Models (LLMs) performance.
Our experiments compare the effectiveness of multimodal approaches against purely textual graph representations.
arXiv Detail & Related papers (2024-09-13T14:26:58Z) - Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information.
Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z) - Representation learning in multiplex graphs: Where and how to fuse
information? [5.0235828656754915]
Multiplex graphs possess richer information, provide better modeling capabilities and integrate more detailed data from potentially different sources.
In this paper, we tackle the problem of learning representations for nodes in multiplex networks in an unsupervised or self-supervised manner.
We propose improvements in how to construct GNN architectures that deal with multiplex graphs.
arXiv Detail & Related papers (2024-02-27T21:47:06Z) - Learning on Multimodal Graphs: A Survey [6.362513821299131]
Multimodal data pervades various domains, including healthcare, social media, and transportation.
multimodal graph learning (MGL) is essential for successful artificial intelligence (AI) applications.
arXiv Detail & Related papers (2024-02-07T23:50:00Z) - When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding
and Reasoning [54.84870836443311]
The paper presents a new paradigm for understanding and reasoning about graph data by integrating image encoding and multimodal technologies.
This approach enables the comprehension of graph data through an instruction-response format, utilizing GPT-4V's advanced capabilities.
The study evaluates this paradigm on various graph types, highlighting the model's strengths and weaknesses, particularly in Chinese OCR performance and complex reasoning tasks.
arXiv Detail & Related papers (2023-12-16T08:14:11Z) - GPT4Graph: Can Large Language Models Understand Graph Structured Data ?
An Empirical Evaluation and Benchmarking [17.7473474499538]
Large language models like ChatGPT have become indispensable to artificial general intelligence.
In this study, we conduct an investigation to assess the proficiency of LLMs in comprehending graph data.
Our findings contribute valuable insights towards bridging the gap between language models and graph understanding.
arXiv Detail & Related papers (2023-05-24T11:53:19Z) - Cross-view Graph Contrastive Representation Learning on Partially
Aligned Multi-view Data [52.491074276133325]
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields.
We propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations.
Experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
arXiv Detail & Related papers (2022-11-08T09:19:32Z) - MMGA: Multimodal Learning with Graph Alignment [8.349066399479938]
We propose MMGA, a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media.
In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders.
We release our dataset, the first social media multimodal dataset with graph, of 60,000 users labeled with specific topics based on 2 million posts to facilitate future research.
arXiv Detail & Related papers (2022-10-18T15:50:31Z) - Graph Pooling for Graph Neural Networks: Progress, Challenges, and
Opportunities [128.55790219377315]
Graph neural networks have emerged as a leading architecture for many graph-level tasks.
graph pooling is indispensable for obtaining a holistic graph-level representation of the whole graph.
arXiv Detail & Related papers (2022-04-15T04:02:06Z) - Group Contrastive Self-Supervised Learning on Graphs [101.45974132613293]
We study self-supervised learning on graphs using contrastive methods.
We argue that contrasting graphs in multiple subspaces enables graph encoders to capture more abundant characteristics.
arXiv Detail & Related papers (2021-07-20T22:09:21Z) - Model-Agnostic Graph Regularization for Few-Shot Learning [60.64531995451357]
We present a comprehensive study on graph embedded few-shot learning.
We introduce a graph regularization approach that allows a deeper understanding of the impact of incorporating graph information between labels.
Our approach improves the performance of strong base learners by up to 2% on Mini-ImageNet and 6.7% on ImageNet-FS.
arXiv Detail & Related papers (2021-02-14T05:28:13Z) - Multi-view Graph Learning by Joint Modeling of Consistency and
Inconsistency [65.76554214664101]
Graph learning has emerged as a promising technique for multi-view clustering with its ability to learn a unified and robust graph from multiple views.
We propose a new multi-view graph learning framework, which for the first time simultaneously models multi-view consistency and multi-view inconsistency in a unified objective function.
Experiments on twelve multi-view datasets have demonstrated the robustness and efficiency of the proposed approach.
arXiv Detail & Related papers (2020-08-24T06:11:29Z) - GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training [62.73470368851127]
Graph representation learning has emerged as a powerful technique for addressing real-world problems.
We design Graph Contrastive Coding -- a self-supervised graph neural network pre-training framework.
We conduct experiments on three graph learning tasks and ten graph datasets.
arXiv Detail & Related papers (2020-06-17T16:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.