S-Omninet: Structured Data Enhanced Universal Multimodal Learning
Architecture
- URL: http://arxiv.org/abs/2307.00226v1
- Date: Sat, 1 Jul 2023 05:02:46 GMT
- Title: S-Omninet: Structured Data Enhanced Universal Multimodal Learning
Architecture
- Authors: Ye Xue, Diego Klabjan, Jean Utke
- Abstract summary: Multimodal multitask learning has attracted an increasing interest in recent years.
Many methods are proposed to learn on a specific type of multimodal data, such as vision and language data.
We extend and improve Omninet, an architecture that is capable of handling multiple modalities and tasks at a time.
- Score: 19.927662512903915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal multitask learning has attracted an increasing interest in recent
years. Singlemodal models have been advancing rapidly and have achieved
astonishing results on various tasks across multiple domains. Multimodal
learning offers opportunities for further improvements by integrating data from
multiple modalities. Many methods are proposed to learn on a specific type of
multimodal data, such as vision and language data. A few of them are designed
to handle several modalities and tasks at a time. In this work, we extend and
improve Omninet, an architecture that is capable of handling multiple
modalities and tasks at a time, by introducing cross-cache attention,
integrating patch embeddings for vision inputs, and supporting structured data.
The proposed Structured-data-enhanced Omninet (S-Omninet) is a universal model
that is capable of learning from structured data of various dimensions
effectively with unstructured data through cross-cache attention, which enables
interactions among spatial, temporal, and structured features. We also enhance
spatial representations in a spatial cache with patch embeddings. We evaluate
the proposed model on several multimodal datasets and demonstrate a significant
improvement over the baseline, Omninet.
Related papers
- SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.
It is designed to accurately detect horizontal or oriented objects from any sensor modality.
This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z) - Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy [2.294223504228228]
Multimodal learning, a rapidly evolving field in artificial intelligence, seeks to construct more versatile and robust systems.
Inspired by the human ability to assimilate information through many senses, this method enables applications such as text-to-video conversion, visual question answering, and image captioning.
Recent developments in datasets that support multimodal language models (MLLMs) are highlighted in this overview.
arXiv Detail & Related papers (2024-12-23T18:15:19Z) - Noise-powered Multi-modal Knowledge Graph Representation Framework [52.95468915728721]
The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph representation learning framework.
We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking.
Our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z) - Multi-modal Semantic Understanding with Contrastive Cross-modal Feature
Alignment [11.897888221717245]
This paper proposes a novel CLIP-guided contrastive-learning-based architecture to perform multi-modal feature alignment.
Our model is simple to implement without using task-specific external knowledge, and thus can easily migrate to other multi-modal tasks.
arXiv Detail & Related papers (2024-03-11T01:07:36Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Reformulating Vision-Language Foundation Models and Datasets Towards
Universal Multimodal Assistants [65.47222691674074]
Muffin framework employs pre-trained vision-language models to act as providers of visual signals.
UniMM-Chat dataset explores the complementarities of datasets to generate 1.1M high-quality and diverse multimodal instructions.
arXiv Detail & Related papers (2023-10-01T12:35:18Z) - Learning Multimodal Data Augmentation in Feature Space [65.54623807628536]
LeMDA is an easy-to-use method that automatically learns to jointly augment multimodal data in feature space.
We show that LeMDA can profoundly improve the performance of multimodal deep learning architectures.
arXiv Detail & Related papers (2022-12-29T20:39:36Z) - Learning Sequential Latent Variable Models from Multimodal Time Series
Data [6.107812768939553]
We present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data.
We demonstrate that our approach leads to significant improvements in prediction and representation quality.
arXiv Detail & Related papers (2022-04-21T21:59:24Z) - Multimodal Clustering Networks for Self-supervised Learning from
Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space.
We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities.
The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.