Related papers: When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning

When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning

URL: http://arxiv.org/abs/2410.09132v1
Date: Fri, 11 Oct 2024 13:24:57 GMT
Title: When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning
Authors: Hao Yan, Chaozhuo Li, Zhigang Yu, Jun Yin, Ruochen Liu, Peiyan Zhang, Weihao Han, Mingzheng Li, Zhengxin Zeng, Hao Sun, Weiwei Deng, Feng Sun, Qi Zhang, Senzhang Wang,
Abstract summary: Multimodal attributed graphs (MAGs) are prevalent in various real-world scenarios and generally contain two kinds of knowledge. Recent advancements in Pre-trained Language/Vision models (PLMs/PVMs) and Graph neural networks (GNNs) have facilitated effective learning on MAGs. We propose Multimodal Attribute Graph Benchmark (MAGB), a comprehensive and diverse collection of challenging benchmark datasets for MAGs.
Score: 36.6581535146878
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal attributed graphs (MAGs) are prevalent in various real-world scenarios and generally contain two kinds of knowledge: (a) Attribute knowledge is mainly supported by the attributes of different modalities contained in nodes (entities) themselves, such as texts and images. (b) Topology knowledge, on the other hand, is provided by the complex interactions posed between nodes. The cornerstone of MAG representation learning lies in the seamless integration of multimodal attributes and topology. Recent advancements in Pre-trained Language/Vision models (PLMs/PVMs) and Graph neural networks (GNNs) have facilitated effective learning on MAGs, garnering increased research interest. However, the absence of meaningful benchmark datasets and standardized evaluation procedures for MAG representation learning has impeded progress in this field. In this paper, we propose Multimodal Attribute Graph Benchmark (MAGB)}, a comprehensive and diverse collection of challenging benchmark datasets for MAGs. The MAGB datasets are notably large in scale and encompass a wide range of domains, spanning from e-commerce networks to social networks. In addition to the brand-new datasets, we conduct extensive benchmark experiments over MAGB with various learning paradigms, ranging from GNN-based and PLM-based methods, to explore the necessity and feasibility of integrating multimodal attributes and graph topology. In a nutshell, we provide an overview of the MAG datasets, standardized evaluation procedures, and present baseline experiments. The entire MAGB project is publicly accessible at https://github.com/sktsherlock/ATG.

Related papers

MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs [6.165053219836395]
We propose MMGraphRAG, which refines visual content through scene graphs and constructs a multimodal knowledge graph.<n>It employs spectral clustering to achieve cross-modal entity linking and retrieves context along reasoning paths to guide the generative process.<n> Experimental results show that MMGraphRAG achieves state-of-the-art performance on the DocBench and MMLongBench datasets.
arXiv Detail & Related papers (2025-07-28T13:16:23Z)
Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning [23.089644598166885]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in representing and understanding diverse modalities.<n>Integrating multimodality with structured graph information (i.e., multimodal graphs, MMGs) is essential for real-world applications such as social networks, healthcare, and recommendation systems.<n>Existing MMG learning methods fall into three paradigms based on how they leverage MLLMs.
arXiv Detail & Related papers (2025-06-12T01:44:46Z)
UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs [34.48393396390799]
We propose a novel cross-domain graph foundation model that enables general representation learning on multimodal graphs. UniGraph2 employs modality-specific encoders alongside a graph neural network (GNN) to learn a unified low-dimensional embedding space. We show that UniGraph2 significantly outperforms state-of-the-art models in tasks such as representation learning, transfer learning, and multimodal generative tasks.
arXiv Detail & Related papers (2025-02-02T14:04:53Z)
MINIMA: Modality Invariant Image Matching [52.505282811925454]
We present MINIMA, a unified image matching framework for multiple cross-modal cases. We scale up the modalities from cheap but rich RGB-only matching data, by means of generative models. With MD-syn, we can directly train any advanced matching pipeline on randomly selected modality pairs to obtain cross-modal ability.
arXiv Detail & Related papers (2024-12-27T02:39:50Z)
Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation [59.4356484322228]
Graph Neural Networks (GNNs) have shown promising performance in this domain. We propose GNNs with Modality-Independent Receptive Fields, which employ separate GNNs with independent receptive fields. Our results indicate that the optimal $K$ for certain modalities on specific datasets can be as low as 1 or 2, which may restrict the GNNs' capacity to capture global information.
arXiv Detail & Related papers (2024-12-18T16:12:26Z)
Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? [62.12375949429938]
Building transferable Graph Neural Networks (GNNs) with CLIP pipeline is challenging because of three fundamental issues. We leverage multi-modal prompt learning to effectively adapt pre-trained GNN to downstream tasks and data. Our new paradigm embeds the graphs directly in the same space as the Large Language Models (LLMs) by learning both graph prompts and text prompts simultaneously.
arXiv Detail & Related papers (2024-12-11T08:03:35Z)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations. We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models. The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z)
Multimodal Graph Benchmark [36.75510196380185]
Multimodal Graph Benchmark (MM-GRAPH) is first comprehensive multi-modal graph benchmark that incorporates both textual and visual information. MM-GRAPH consists of five graph learning datasets of various scales that are appropriate for different learning tasks. MM-GRAPH aims to foster research on multimodal graph learning and drive the development of more advanced and robust graph learning algorithms.
arXiv Detail & Related papers (2024-06-24T05:14:09Z)
Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights [44.11628188443046]
A Graph Foundation Model (GFM) can work well across different graphs and tasks with a unified backbone. Inspired by multi-modal models that align different modalities with natural language, the text has recently been adopted to provide a unified feature space for diverse graphs. Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems.
arXiv Detail & Related papers (2024-06-15T19:56:21Z)
MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion [51.80447197290866]
We introduce MyGO to process, fuse, and augment the fine-grained modality information from MMKGs. MyGO tokenizes multi-modal raw data as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder. Experiments on standard MMKGC benchmarks reveal that our method surpasses 20 of the latest models.
arXiv Detail & Related papers (2024-04-15T05:40:41Z)
NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities. We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z)
Learning on Multimodal Graphs: A Survey [6.362513821299131]
Multimodal data pervades various domains, including healthcare, social media, and transportation. multimodal graph learning (MGL) is essential for successful artificial intelligence (AI) applications.
arXiv Detail & Related papers (2024-02-07T23:50:00Z)
ADAMM: Anomaly Detection of Attributed Multi-graphs with Metadata: A Unified Neural Network Approach [39.211176955683285]
We propose ADAMM, a novel graph neural network model that handles directed multi-graphs. ADAMM fuses metadata and graph-level representation learning through an unsupervised anomaly detection objective.
arXiv Detail & Related papers (2023-11-13T14:19:36Z)
Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings. We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z)
Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z)
MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields [26.450463943664822]
We propose a multimodal classification benchmark MuG with eight datasets that allows researchers to evaluate and improve their models. We conduct multi-aspect data analysis to provide insights into the benchmark, including label balance ratios, percentages of missing features, distributions of data within each modality, and the correlations between labels and input modalities.
arXiv Detail & Related papers (2023-02-06T18:09:06Z)
Learnable Graph Convolutional Network and Feature Fusion for Multi-view Learning [30.74535386745822]
This paper proposes a joint deep learning framework called Learnable Graph Convolutional Network and Feature Fusion (LGCN-FF) It consists of two stages: feature fusion network and learnable graph convolutional network. The proposed LGCN-FF is validated to be superior to various state-of-the-art methods in multi-view semi-supervised classification.
arXiv Detail & Related papers (2022-11-16T19:07:12Z)
MMGA: Multimodal Learning with Graph Alignment [8.349066399479938]
We propose MMGA, a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media. In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders. We release our dataset, the first social media multimodal dataset with graph, of 60,000 users labeled with specific topics based on 2 million posts to facilitate future research.
arXiv Detail & Related papers (2022-10-18T15:50:31Z)
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge. MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z)
Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction [123.20238648121445]
We propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT) GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information. We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets.
arXiv Detail & Related papers (2021-10-29T19:55:12Z)
Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard. We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z)
More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification [43.35966675372692]
We show how to train deep networks and build the network architecture. In particular, we show different fusion strategies as well as how to train deep networks and build the network architecture. Our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2020-08-12T17:45:25Z)
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT. We first represent the input sentence and image using a unified multi-modal graph. We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.