Geometric multimodal representation learning
- URL: http://arxiv.org/abs/2209.03299v1
- Date: Wed, 7 Sep 2022 16:59:03 GMT
- Title: Geometric multimodal representation learning
- Authors: Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, Marinka
Zitnik
- Abstract summary: Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge.
We put forward an algorithmic blueprint for multimodal graph learning based on this categorization.
This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
- Score: 13.159512679346687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph-centric artificial intelligence (graph AI) has achieved remarkable
success in modeling interacting systems prevalent in nature, from dynamical
systems in biology to particle physics. The increasing heterogeneity of data
calls for graph neural architectures that can combine multiple inductive
biases. However, combining data from various sources is challenging because
appropriate inductive bias may vary by data modality. Multimodal learning
methods fuse multiple data modalities while leveraging cross-modal dependencies
to address this challenge. Here, we survey 140 studies in graph-centric AI and
realize that diverse data types are increasingly brought together using graphs
and fed into sophisticated multimodal models. These models stratify into
image-, language-, and knowledge-grounded multimodal learning. We put forward
an algorithmic blueprint for multimodal graph learning based on this
categorization. The blueprint serves as a way to group state-of-the-art
architectures that treat multimodal data by choosing appropriately four
different components. This effort can pave the way for standardizing the design
of sophisticated multimodal architectures for highly complex real-world
problems.
Related papers
- Multimodal Representation Learning using Adaptive Graph Construction [0.5221459608786241]
Multimodal contrastive learning train neural networks by levergaing data from heterogeneous sources such as images and text.
We propose AutoBIND, a novel contrastive learning framework that can learn representations from an arbitrary number of modalites.
We show that AutoBIND outperforms previous methods on this task, highlighting the generalizablility of the approach.
arXiv Detail & Related papers (2024-10-08T21:57:46Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Bond Graphs for multi-physics informed Neural Networks for multi-variate time series [6.775534755081169]
Existing methods are not adapted to tasks with complex multi-physical and multi-domain phenomena.
We propose a Neural Bond graph (NBgE) producing multi-physics-informed representations that can be fed into any task-specific model.
arXiv Detail & Related papers (2024-05-22T12:30:25Z) - HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture.
We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - Multimodal Graph Learning for Generative Tasks [89.44810441463652]
Multimodal learning combines multiple data modalities, broadening the types and complexity of data our models can utilize.
We propose Multimodal Graph Learning (MMGL), a framework for capturing information from multiple multimodal neighbors with relational structures among them.
arXiv Detail & Related papers (2023-10-11T13:25:03Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Convolutional Learning on Multigraphs [153.20329791008095]
We develop convolutional information processing on multigraphs and introduce convolutional multigraph neural networks (MGNNs)
To capture the complex dynamics of information diffusion within and across each of the multigraph's classes of edges, we formalize a convolutional signal processing model.
We develop a multigraph learning architecture, including a sampling procedure to reduce computational complexity.
The introduced architecture is applied towards optimal wireless resource allocation and a hate speech localization task, offering improved performance over traditional graph neural networks.
arXiv Detail & Related papers (2022-09-23T00:33:04Z) - Geometric Multimodal Deep Learning with Multi-Scaled Graph Wavelet
Convolutional Network [21.06669693699965]
Multimodal data provide information of a natural phenomenon by integrating data from various domains with very different statistical properties.
Capturing the intra-modality and cross-modality information of multimodal data is the essential capability of multimodal learning methods.
Generalizing deep learning methods to the non-Euclidean domains is an emerging research field.
arXiv Detail & Related papers (2021-11-26T08:41:51Z) - Multi-Robot Deep Reinforcement Learning for Mobile Navigation [82.62621210336881]
We propose a deep reinforcement learning algorithm with hierarchically integrated models (HInt)
At training time, HInt learns separate perception and dynamics models, and at test time, HInt integrates the two models in a hierarchical manner and plans actions with the integrated model.
Our mobile navigation experiments show that HInt outperforms conventional hierarchical policies and single-source approaches.
arXiv Detail & Related papers (2021-06-24T19:07:40Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.