Deep Multimodal Neural Architecture Search
- URL: http://arxiv.org/abs/2004.12070v2
- Date: Sun, 11 Oct 2020 03:28:08 GMT
- Title: Deep Multimodal Neural Architecture Search
- Authors: Zhou Yu, Yuhao Cui, Jun Yu, Meng Wang, Dacheng Tao, Qi Tian
- Abstract summary: We devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks.
Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone.
On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks.
- Score: 178.35131768344246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing effective neural networks is fundamentally important in deep
multimodal learning. Most existing works focus on a single task and design
neural architectures manually, which are highly task-specific and hard to
generalize to different tasks. In this paper, we devise a generalized deep
multimodal neural architecture search (MMnas) framework for various multimodal
learning tasks. Given multimodal input, we first define a set of primitive
operations, and then construct a deep encoder-decoder based unified backbone,
where each encoder or decoder block corresponds to an operation searched from a
predefined operation pool. On top of the unified backbone, we attach
task-specific heads to tackle different multimodal learning tasks. By using a
gradient-based NAS algorithm, the optimal architectures for different tasks are
learned efficiently. Extensive ablation studies, comprehensive analysis, and
comparative experimental results show that the obtained MMnasNet significantly
outperforms existing state-of-the-art approaches across three multimodal
learning tasks (over five datasets), including visual question answering,
image-text matching, and visual grounding.
Related papers
- A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Multi-modal Semantic Understanding with Contrastive Cross-modal Feature
Alignment [11.897888221717245]
This paper proposes a novel CLIP-guided contrastive-learning-based architecture to perform multi-modal feature alignment.
Our model is simple to implement without using task-specific external knowledge, and thus can easily migrate to other multi-modal tasks.
arXiv Detail & Related papers (2024-03-11T01:07:36Z) - OmniVec: Learning robust representations with cross modal sharing [28.023214572340336]
We present an approach to learn multiple tasks, in multiple modalities, with a unified architecture.
The proposed network is composed of task specific encoders, a common trunk in the middle, followed by task specific prediction heads.
We train the network on all major modalities, e.g. visual, audio, text and 3D, and report results on $22$ diverse and challenging public benchmarks.
arXiv Detail & Related papers (2023-11-07T14:00:09Z) - Dynamic Neural Network for Multi-Task Learning Searching across Diverse
Network Topologies [14.574399133024594]
We present a new MTL framework that searches for optimized structures for multiple tasks with diverse graph topologies.
We design a restricted DAG-based central network with read-in/read-out layers to build topologically diverse task-adaptive structures.
arXiv Detail & Related papers (2023-03-13T05:01:50Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural
Network Synthesis [53.106414896248246]
We present a framework that allows analysts to effectively build the solution sub-graph space and guide the network search by injecting their domain knowledge.
Applying this technique in an iterative manner allows analysts to converge to the best performing neural network architecture for a given application.
arXiv Detail & Related papers (2020-09-28T01:48:45Z) - Multi-Task Learning with Deep Neural Networks: A Survey [0.0]
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are simultaneously learned by a shared model.
We give an overview of multi-task learning methods for deep neural networks, with the aim of summarizing both the well-established and most recent directions within the field.
arXiv Detail & Related papers (2020-09-10T19:31:04Z) - Automated Search for Resource-Efficient Branched Multi-Task Networks [81.48051635183916]
We propose a principled approach, rooted in differentiable neural architecture search, to automatically define branching structures in a multi-task neural network.
We show that our approach consistently finds high-performing branching structures within limited resource budgets.
arXiv Detail & Related papers (2020-08-24T09:49:19Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.