Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction
- URL: http://arxiv.org/abs/2112.02252v1
- Date: Sat, 4 Dec 2021 05:47:54 GMT
- Title: Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction
- Authors: Yikai Wang, Wenbing Huang, Fuchun Sun, Fengxiang He, Dacheng Tao
- Abstract summary: We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
- Score: 125.18248926508045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal fusion and multitask learning are two vital topics in machine
learning. Despite the fruitful progress, existing methods for both problems are
still brittle to the same challenge -- it remains dilemmatic to integrate the
common information across modalities (resp. tasks) meanwhile preserving the
specific patterns of each modality (resp. task). Besides, while they are
actually closely related to each other, multimodal fusion and multitask
learning are rarely explored within the same methodological framework before.
In this paper, we propose Channel-Exchanging-Network (CEN) which is
self-adaptive, parameter-free, and more importantly, applicable for both
multimodal fusion and multitask learning. At its core, CEN dynamically
exchanges channels between subnetworks of different modalities. Specifically,
the channel exchanging process is self-guided by individual channel importance
that is measured by the magnitude of Batch-Normalization (BN) scaling factor
during training. For the application of dense image prediction, the validity of
CEN is tested by four different scenarios: multimodal fusion, cycle multimodal
fusion, multitask learning, and multimodal multitask learning. Extensive
experiments on semantic segmentation via RGB-D data and image translation
through multi-domain input verify the effectiveness of our CEN compared to
current state-of-the-art methods. Detailed ablation studies have also been
carried out, which provably affirm the advantage of each component we propose.
Related papers
- Multimodal Information Interaction for Medical Image Segmentation [24.024848382458767]
We introduce an innovative Multimodal Information Cross Transformer (MicFormer)
It queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features.
Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively.
arXiv Detail & Related papers (2024-04-25T07:21:14Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - Multi-Task Learning for Visual Scene Understanding [7.191593674138455]
This thesis is concerned with multi-task learning in the context of computer vision.
We propose several methods that tackle important aspects of multi-task learning.
The results show several advances in the state-of-the-art of multi-task learning.
arXiv Detail & Related papers (2022-03-28T16:57:58Z) - Revisit Multimodal Meta-Learning through the Lens of Multi-Task Learning [33.19179706038397]
Multimodal meta-learning is a recent problem that extends conventional few-shot meta-learning by generalizing its setup to diverse multimodal task distributions.
Previous work claims that a single meta-learner trained on a multimodal distribution can sometimes outperform multiple specialized meta-learners trained on individual unimodal distributions.
Our work makes two contributions to multimodal meta-learning. First, we propose a method to quantify knowledge transfer between tasks of different modes at a micro-level.
Second, inspired by hard parameter sharing in multi-task learning and a new interpretation of related work, we propose a new multimodal meta-learn
arXiv Detail & Related papers (2021-10-27T06:23:45Z) - Deep Multimodal Fusion by Channel Exchanging [87.40768169300898]
This paper proposes a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities.
The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network.
arXiv Detail & Related papers (2020-11-10T09:53:20Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.