OmniVec: Learning robust representations with cross modal sharing
- URL: http://arxiv.org/abs/2311.05709v1
- Date: Tue, 7 Nov 2023 14:00:09 GMT
- Title: OmniVec: Learning robust representations with cross modal sharing
- Authors: Siddharth Srivastava, Gaurav Sharma
- Abstract summary: We present an approach to learn multiple tasks, in multiple modalities, with a unified architecture.
The proposed network is composed of task specific encoders, a common trunk in the middle, followed by task specific prediction heads.
We train the network on all major modalities, e.g. visual, audio, text and 3D, and report results on $22$ diverse and challenging public benchmarks.
- Score: 28.023214572340336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Majority of research in learning based methods has been towards designing and
training networks for specific tasks. However, many of the learning based
tasks, across modalities, share commonalities and could be potentially tackled
in a joint framework. We present an approach in such direction, to learn
multiple tasks, in multiple modalities, with a unified architecture. The
proposed network is composed of task specific encoders, a common trunk in the
middle, followed by task specific prediction heads. We first pre-train it by
self-supervised masked training, followed by sequential training for the
different tasks. We train the network on all major modalities, e.g.\ visual,
audio, text and 3D, and report results on $22$ diverse and challenging public
benchmarks. We demonstrate empirically that, using a joint network to train
across modalities leads to meaningful information sharing and this allows us to
achieve state-of-the-art results on most of the benchmarks. We also show
generalization of the trained network on cross-modal tasks as well as unseen
datasets and tasks.
Related papers
- YOLOR-Based Multi-Task Learning [12.5920336941241]
Multi-task learning (MTL) aims to learn multiple tasks using a single model and jointly improve all of them assuming generalization and shared semantics.
We propose building on You Only Learn One Representation (YOLOR), a network architecture specifically designed for multitasking.
We find that our method achieves competitive performance on all tasks while maintaining a low parameter count and without any pre-training.
arXiv Detail & Related papers (2023-09-29T01:42:21Z) - Modular Approach to Machine Reading Comprehension: Mixture of Task-Aware
Experts [0.5801044612920815]
We present a Mixture of Task-Aware Experts Network for Machine Reading on a relatively small dataset.
We focus on the issue of common-sense learning, enforcing the common ground knowledge.
We take inspi ration on the recent advancements of multitask and transfer learning.
arXiv Detail & Related papers (2022-10-04T17:13:41Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - PaRT: Parallel Learning Towards Robust and Transparent AI [4.160969852186451]
This paper takes a parallel learning approach for robust and transparent AI.
A deep neural network is trained in parallel on multiple tasks, where each task is trained only on a subset of the network resources.
We show that the network does indeed use learned knowledge from some tasks in other tasks, through shared representations.
arXiv Detail & Related papers (2022-01-24T09:03:28Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Distribution Matching for Heterogeneous Multi-Task Learning: a
Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm.
We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems.
We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z) - Multi-Task Learning with Deep Neural Networks: A Survey [0.0]
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are simultaneously learned by a shared model.
We give an overview of multi-task learning methods for deep neural networks, with the aim of summarizing both the well-established and most recent directions within the field.
arXiv Detail & Related papers (2020-09-10T19:31:04Z) - Deep Multimodal Neural Architecture Search [178.35131768344246]
We devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks.
Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone.
On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks.
arXiv Detail & Related papers (2020-04-25T07:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.