MMGA: Multimodal Learning with Graph Alignment
- URL: http://arxiv.org/abs/2210.09946v1
- Date: Tue, 18 Oct 2022 15:50:31 GMT
- Title: MMGA: Multimodal Learning with Graph Alignment
- Authors: Xuan Yang, Yang Yang
- Abstract summary: We propose MMGA, a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media.
In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders.
We release our dataset, the first social media multimodal dataset with graph, of 60,000 users labeled with specific topics based on 2 million posts to facilitate future research.
- Score: 8.349066399479938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal pre-training breaks down the modality barriers and allows the
individual modalities to be mutually augmented with information, resulting in
significant advances in representation learning. However, graph modality, as a
very general and important form of data, cannot be easily interacted with other
modalities because of its non-regular nature. In this paper, we propose MMGA
(Multimodal learning with Graph Alignment), a novel multimodal pre-training
framework to incorporate information from graph (social network), image and
text modalities on social media to enhance user representation learning. In
MMGA, a multi-step graph alignment mechanism is proposed to add the
self-supervision from graph modality to optimize the image and text encoders,
while using the information from the image and text modalities to guide the
graph encoder learning. We conduct experiments on the dataset crawled from
Instagram. The experimental results show that MMGA works well on the dataset
and improves the fans prediction task's performance. We release our dataset,
the first social media multimodal dataset with graph, of 60,000 users labeled
with specific topics based on 2 million posts to facilitate future research.
Related papers
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I.
InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling.
A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z) - Multimodal Graph Benchmark [36.75510196380185]
Multimodal Graph Benchmark (MM-GRAPH) is first comprehensive multi-modal graph benchmark that incorporates both textual and visual information.
MM-GRAPH consists of five graph learning datasets of various scales that are appropriate for different learning tasks.
MM-GRAPH aims to foster research on multimodal graph learning and drive the development of more advanced and robust graph learning algorithms.
arXiv Detail & Related papers (2024-06-24T05:14:09Z) - MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction [8.592259720470697]
We propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning framework for brain disorders prediction.
We introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system.
We also propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features.
arXiv Detail & Related papers (2024-06-20T16:14:43Z) - When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding
and Reasoning [54.84870836443311]
The paper presents a new paradigm for understanding and reasoning about graph data by integrating image encoding and multimodal technologies.
This approach enables the comprehension of graph data through an instruction-response format, utilizing GPT-4V's advanced capabilities.
The study evaluates this paradigm on various graph types, highlighting the model's strengths and weaknesses, particularly in Chinese OCR performance and complex reasoning tasks.
arXiv Detail & Related papers (2023-12-16T08:14:11Z) - Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models [14.251972223585765]
This paper introduces a new approach to encoding a graph with diverse modalities, such as text, image, and motif, and prompts to approximate a graph's global connectivity.
The study also presents GraphTMI, a novel benchmark for evaluating Large Language Models (LLMs) in graph structure analysis.
arXiv Detail & Related papers (2023-11-16T12:45:41Z) - Multimodal Graph Transformer for Multimodal Question Answering [9.292566397511763]
We propose a novel Multimodal Graph Transformer for question answering tasks that requires performing reasoning across multiple modalities.
We introduce a graph-involved plug-and-play quasi-attention mechanism to incorporate multimodal graph information.
We validate the effectiveness of Multimodal Graph Transformer over its Transformer baselines on GQA, VQAv2, and MultiModalQA datasets.
arXiv Detail & Related papers (2023-04-30T21:22:35Z) - Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings.
Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework.
Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z) - Multi-modal Graph Learning for Disease Prediction [35.156975779372836]
We propose an end-to-end Multi-modal Graph Learning framework (MMGL) for disease prediction with multi-modality.
Instead of defining the graph manually, the latent graph structure is captured through an effective way of adaptive graph learning.
An extensive group of experiments on two disease prediction tasks demonstrates that the proposed MMGL achieves more favorable performance.
arXiv Detail & Related papers (2022-03-11T12:33:20Z) - Data Augmentation for Deep Graph Learning: A Survey [66.04015540536027]
We first propose a taxonomy for graph data augmentation and then provide a structured review by categorizing the related work based on the augmented information modalities.
Focusing on the two challenging problems in DGL (i.e., optimal graph learning and low-resource graph learning), we also discuss and review the existing learning paradigms which are based on graph data augmentation.
arXiv Detail & Related papers (2022-02-16T18:30:33Z) - GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training [62.73470368851127]
Graph representation learning has emerged as a powerful technique for addressing real-world problems.
We design Graph Contrastive Coding -- a self-supervised graph neural network pre-training framework.
We conduct experiments on three graph learning tasks and ten graph datasets.
arXiv Detail & Related papers (2020-06-17T16:18:35Z) - Graph Representation Learning via Graphical Mutual Information
Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations.
We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.