Learning on Multimodal Graphs: A Survey
- URL: http://arxiv.org/abs/2402.05322v1
- Date: Wed, 7 Feb 2024 23:50:00 GMT
- Title: Learning on Multimodal Graphs: A Survey
- Authors: Ciyuan Peng, Jiayuan He and Feng Xia
- Abstract summary: Multimodal data pervades various domains, including healthcare, social media, and transportation.
multimodal graph learning (MGL) is essential for successful artificial intelligence (AI) applications.
- Score: 6.362513821299131
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Multimodal data pervades various domains, including healthcare, social media,
and transportation, where multimodal graphs play a pivotal role. Machine
learning on multimodal graphs, referred to as multimodal graph learning (MGL),
is essential for successful artificial intelligence (AI) applications. The
burgeoning research in this field encompasses diverse graph data types and
modalities, learning techniques, and application scenarios. This survey paper
conducts a comparative analysis of existing works in multimodal graph learning,
elucidating how multimodal learning is achieved across different graph types
and exploring the characteristics of prevalent learning techniques.
Additionally, we delineate significant applications of multimodal graph
learning and offer insights into future directions in this domain.
Consequently, this paper serves as a foundational resource for researchers
seeking to comprehend existing MGL techniques and their applicability across
diverse scenarios.
Related papers
- When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning [36.6581535146878]
Multimodal attributed graphs (MAGs) are prevalent in various real-world scenarios and generally contain two kinds of knowledge.
Recent advancements in Pre-trained Language/Vision models (PLMs/PVMs) and Graph neural networks (GNNs) have facilitated effective learning on MAGs.
We propose Multimodal Attribute Graph Benchmark (MAGB), a comprehensive and diverse collection of challenging benchmark datasets for MAGs.
arXiv Detail & Related papers (2024-10-11T13:24:57Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Multimodal Graph Benchmark [36.75510196380185]
Multimodal Graph Benchmark (MM-GRAPH) is first comprehensive multi-modal graph benchmark that incorporates both textual and visual information.
MM-GRAPH consists of five graph learning datasets of various scales that are appropriate for different learning tasks.
MM-GRAPH aims to foster research on multimodal graph learning and drive the development of more advanced and robust graph learning algorithms.
arXiv Detail & Related papers (2024-06-24T05:14:09Z) - Multimodal Large Language Models: A Survey [36.06016060015404]
Multimodal language models integrate multiple data types, such as images, text, language, audio, and other heterogeneity.
This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms.
A practical guide is provided, offering insights into the technical aspects of multimodal models.
Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development.
arXiv Detail & Related papers (2023-11-22T05:15:12Z) - Multimodal Graph Learning for Generative Tasks [89.44810441463652]
Multimodal learning combines multiple data modalities, broadening the types and complexity of data our models can utilize.
We propose Multimodal Graph Learning (MMGL), a framework for capturing information from multiple multimodal neighbors with relational structures among them.
arXiv Detail & Related papers (2023-10-11T13:25:03Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - Multimodality Representation Learning: A Survey on Evolution,
Pretraining and Its Applications [47.501121601856795]
Multimodality Representation Learning is a technique of learning to embed information from different modalities and their correlations.
Cross-modal interaction and complementary information from different modalities are crucial for advanced models to perform any multimodal task.
This survey presents the literature on the evolution and enhancement of deep learning multimodal architectures.
arXiv Detail & Related papers (2023-02-01T11:48:34Z) - Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
multimodal machine learning that incorporates data from various sources has become an increasingly popular research area.
We analyze the commonness and uniqueness of each data format mainly ranging from vision, audio, text, and motions.
We investigate the existing literature on multimodal learning from both the representation learning and downstream application levels.
arXiv Detail & Related papers (2022-10-05T13:14:57Z) - Geometric multimodal representation learning [13.159512679346687]
Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge.
We put forward an algorithmic blueprint for multimodal graph learning based on this categorization.
This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
arXiv Detail & Related papers (2022-09-07T16:59:03Z) - Multimodal Image Synthesis and Editing: The Generative AI Era [131.9569600472503]
multimodal image synthesis and editing has become a hot research topic in recent years.
We comprehensively contextualize the advance of the recent multimodal image synthesis and editing.
We describe benchmark datasets and evaluation metrics as well as corresponding experimental results.
arXiv Detail & Related papers (2021-12-27T10:00:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.