Multimodal Knowledge Expansion
- URL: http://arxiv.org/abs/2103.14431v1
- Date: Fri, 26 Mar 2021 12:32:07 GMT
- Title: Multimodal Knowledge Expansion
- Authors: Zihui Xue, Sucheng Ren, Zhengqi Gao and Hang Zhao
- Abstract summary: We propose a knowledge distillation-based framework to utilize multimodal data without requiring labels.
We show that a multimodal student model consistently denoises pseudo labels and generalizes better than its teacher.
- Score: 14.332957885505547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The popularity of multimodal sensors and the accessibility of the Internet
have brought us a massive amount of unlabeled multimodal data. Since existing
datasets and well-trained models are primarily unimodal, the modality gap
between a unimodal network and unlabeled multimodal data poses an interesting
problem: how to transfer a pre-trained unimodal network to perform the same
task on unlabeled multimodal data? In this work, we propose multimodal
knowledge expansion (MKE), a knowledge distillation-based framework to
effectively utilize multimodal data without requiring labels. Opposite to
traditional knowledge distillation, where the student is designed to be
lightweight and inferior to the teacher, we observe that a multimodal student
model consistently denoises pseudo labels and generalizes better than its
teacher. Extensive experiments on four tasks and different modalities verify
this finding. Furthermore, we connect the mechanism of MKE to semi-supervised
learning and offer both empirical and theoretical explanations to understand
the denoising capability of a multimodal student.
Related papers
- Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances from multiple modalities when only a few labeled examples are available.
We propose a Generative Transfer Learning framework consisting of two stages: the first involves training on abundant unimodal data, and the second focuses on transfer learning to adapt to novel data.
Our finds demonstrate that GTL has superior performance compared to state-of-the-art methods across four distinct multi-modal datasets.
arXiv Detail & Related papers (2024-10-14T16:09:38Z) - Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis [25.66434557076494]
We propose a novel meta uni-label generation (MUG) framework to address the above problem.
We first design a contrastive-based projection module to bridge the gap between unimodal and multimodal representations.
We then propose unimodal and multimodal denoising tasks to train MUCN with explicit supervision via a bi-level optimization strategy.
arXiv Detail & Related papers (2024-08-28T03:43:01Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - MultiDelete for Multimodal Machine Unlearning [14.755831733659699]
MultiDelete is designed to decouple associations between unimodal data points during unlearning.
It can maintain the multimodal and unimodal knowledge of the original model post unlearning.
It can provide better protection to unlearned data against adversarial attacks.
arXiv Detail & Related papers (2023-11-18T08:30:38Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unlock the Power: Competitive Distillation for Multi-Modal Large
Language Models [17.25135606956287]
Competitive Multi-modal Distillation framework (CoMD) captures bidirectional feedback between teacher and student models.
Our experimental analysis of diverse datasets shows that our knowledge transfer method consistently improves the capabilities of the student model.
arXiv Detail & Related papers (2023-11-14T14:49:46Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning
with Hierarchical Aggregation [16.308470947384134]
HA-Fedformer is a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client.
We develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling.
Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models.
arXiv Detail & Related papers (2023-03-27T07:07:33Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - What Makes Multimodal Learning Better than Single (Provably) [28.793128982222438]
We show that learning with multiple modalities achieves a smaller population risk thanonly using its subset of modalities.
This is the first theoretical treatment to capture important qualitative phenomenaobserved in real multimodal applications.
arXiv Detail & Related papers (2021-06-08T17:20:02Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.