Enhanced Multimodal Representation Learning with Cross-modal KD
- URL: http://arxiv.org/abs/2306.07646v1
- Date: Tue, 13 Jun 2023 09:35:37 GMT
- Title: Enhanced Multimodal Representation Learning with Cross-modal KD
- Authors: Mengxi Chen, Linyu Xing, Yu Wang, Ya Zhang
- Abstract summary: This paper explores leveraging auxiliary modalities which are only available at training to enhance multimodal representation learning through cross-modal Knowledge Distillation (KD)
The widely adopted mutual information-based objective leads to a short-cut solution of the weak teacher, i.e., achieving the maximum mutual information by simply making the teacher model as weak as the student model.
To prevent such a weak solution, we introduce an additional objective term, i.e., the mutual information between the teacher and the auxiliary modality model.
- Score: 14.14709952127258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the tasks of leveraging auxiliary modalities which are
only available at training to enhance multimodal representation learning
through cross-modal Knowledge Distillation (KD). The widely adopted mutual
information maximization-based objective leads to a short-cut solution of the
weak teacher, i.e., achieving the maximum mutual information by simply making
the teacher model as weak as the student model. To prevent such a weak
solution, we introduce an additional objective term, i.e., the mutual
information between the teacher and the auxiliary modality model. Besides, to
narrow down the information gap between the student and teacher, we further
propose to minimize the conditional entropy of the teacher given the student.
Novel training schemes based on contrastive learning and adversarial learning
are designed to optimize the mutual information and the conditional entropy,
respectively. Experimental results on three popular multimodal benchmark
datasets have shown that the proposed method outperforms a range of
state-of-the-art approaches for video recognition, video retrieval and emotion
classification.
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment [10.104085497265004]
We propose Ranking Loss based Knowledge Distillation (RLKD), which encourages consistency of peak predictions between the teacher and student models.
Our method enables the student model to better learn the multi-modal distributions of the teacher model, leading to a significant performance improvement in various downstream tasks.
arXiv Detail & Related papers (2024-09-19T08:06:42Z) - Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning [66.28872204574648]
Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information.
Previous work on cross-modal coherence modeling attempted to leverage the order information from another modality to assist the coherence recovering of the target modality.
This paper explores a new way to take advantage of cross-modal guidance without gold labels on coherency.
arXiv Detail & Related papers (2024-08-01T06:04:44Z) - Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning [23.035725779568587]
We study the role and interactions of multiple modalities in mitigating forgetting in deep neural networks (DNNs)
Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations.
We propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality.
arXiv Detail & Related papers (2024-05-04T22:02:58Z) - Competitive Ensembling Teacher-Student Framework for Semi-Supervised
Left Atrium MRI Segmentation [8.338801567668233]
Semi-supervised learning has greatly advanced medical image segmentation since it effectively alleviates the need of acquiring abundant annotations from experts.
In this paper, we present a simple yet efficient competitive ensembling teacher student framework for semi-supervised for left atrium segmentation from 3D MR images.
arXiv Detail & Related papers (2023-10-21T09:23:34Z) - VideoAdviser: Video Knowledge Distillation for Multimodal Transfer
Learning [6.379202839994046]
Multimodal transfer learning aims to transform pretrained representations of diverse modalities into a common domain space for effective multimodal fusion.
We propose VideoAdviser, a video knowledge distillation method to transfer multimodal knowledge of video-enhanced prompts from a multimodal fundamental model to a specific modal fundamental model.
We evaluate our method in two challenging multimodal tasks: video-level sentiment analysis and audio-visual retrieval.
arXiv Detail & Related papers (2023-09-27T08:44:04Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Modality-specific Distillation [30.190082262375395]
We propose modality-specific distillation (MSD) to effectively transfer knowledge from a teacher on multimodal datasets.
Our idea aims at mimicking a teacher's modality-specific predictions by introducing an auxiliary loss term for each modality.
Because each modality has different importance for predictions, we also propose weighting approaches for the auxiliary losses.
arXiv Detail & Related papers (2021-01-06T05:45:07Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z) - Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model.
The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.