Multimodal Representation Learning using Adaptive Graph Construction
- URL: http://arxiv.org/abs/2410.06395v1
- Date: Tue, 8 Oct 2024 21:57:46 GMT
- Title: Multimodal Representation Learning using Adaptive Graph Construction
- Authors: Weichen Huang,
- Abstract summary: Multimodal contrastive learning train neural networks by levergaing data from heterogeneous sources such as images and text.
We propose AutoBIND, a novel contrastive learning framework that can learn representations from an arbitrary number of modalites.
We show that AutoBIND outperforms previous methods on this task, highlighting the generalizablility of the approach.
- Score: 0.5221459608786241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal contrastive learning train neural networks by levergaing data from heterogeneous sources such as images and text. Yet, many current multimodal learning architectures cannot generalize to an arbitrary number of modalities and need to be hand-constructed. We propose AutoBIND, a novel contrastive learning framework that can learn representations from an arbitrary number of modalites through graph optimization. We evaluate AutoBIND on Alzhiemer's disease detection because it has real-world medical applicability and it contains a broad range of data modalities. We show that AutoBIND outperforms previous methods on this task, highlighting the generalizablility of the approach.
Related papers
- GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation [68.63955715643974]
Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
We propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
arXiv Detail & Related papers (2024-07-08T01:06:13Z) - Can Text-to-image Model Assist Multi-modal Learning for Visual
Recognition with Visual Modality Missing? [37.73329106465031]
We propose a text-to-image framework GTI-MM to enhance the data efficiency and model robustness against missing visual modality.
Our findings reveal that synthetic images benefit training data efficiency with visual data missing in training and improve model robustness with visual data missing involving training and testing.
arXiv Detail & Related papers (2024-02-14T09:21:00Z) - HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture.
We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - Multi-modal Multi-kernel Graph Learning for Autism Prediction and
Biomarker Discovery [29.790200009136825]
We propose a novel method to offset the negative impact between modalities in the process of multi-modal integration and extract heterogeneous information from graphs.
Our method is evaluated on the benchmark Autism Brain Imaging Data Exchange (ABIDE) dataset and outperforms the state-of-the-art methods.
In addition, discriminative brain regions associated with autism are identified by our model, providing guidance for the study of autism pathology.
arXiv Detail & Related papers (2023-03-03T07:09:17Z) - Learning Multimodal Data Augmentation in Feature Space [65.54623807628536]
LeMDA is an easy-to-use method that automatically learns to jointly augment multimodal data in feature space.
We show that LeMDA can profoundly improve the performance of multimodal deep learning architectures.
arXiv Detail & Related papers (2022-12-29T20:39:36Z) - Convolutional Learning on Multigraphs [153.20329791008095]
We develop convolutional information processing on multigraphs and introduce convolutional multigraph neural networks (MGNNs)
To capture the complex dynamics of information diffusion within and across each of the multigraph's classes of edges, we formalize a convolutional signal processing model.
We develop a multigraph learning architecture, including a sampling procedure to reduce computational complexity.
The introduced architecture is applied towards optimal wireless resource allocation and a hate speech localization task, offering improved performance over traditional graph neural networks.
arXiv Detail & Related papers (2022-09-23T00:33:04Z) - Geometric multimodal representation learning [13.159512679346687]
Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge.
We put forward an algorithmic blueprint for multimodal graph learning based on this categorization.
This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
arXiv Detail & Related papers (2022-09-07T16:59:03Z) - Multi-modal Graph Learning for Disease Prediction [35.4310911850558]
We propose an end-to-end Multimodal Graph Learning framework (MMGL) for disease prediction.
Instead of defining the adjacency matrix manually as existing methods, the latent graph structure can be captured through a novel way of adaptive graph learning.
arXiv Detail & Related papers (2021-07-01T03:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.