A Dimensional Structure based Knowledge Distillation Method for
Cross-Modal Learning
- URL: http://arxiv.org/abs/2306.15977v1
- Date: Wed, 28 Jun 2023 07:29:26 GMT
- Title: A Dimensional Structure based Knowledge Distillation Method for
Cross-Modal Learning
- Authors: Lingyu Si, Hongwei Dong, Wenwen Qiang, Junzhi Yu, Wenlong Zhai,
Changwen Zheng, Fanjiang Xu, Fuchun Sun
- Abstract summary: We discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks.
We propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance.
The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy.
- Score: 15.544134849816528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to limitations in data quality, some essential visual tasks are difficult
to perform independently. Introducing previously unavailable information to
transfer informative dark knowledge has been a common way to solve such hard
tasks. However, research on why transferred knowledge works has not been
extensively explored. To address this issue, in this paper, we discover the
correlation between feature discriminability and dimensional structure (DS) by
analyzing and observing features extracted from simple and hard tasks. On this
basis, we express DS using deep channel-wise correlation and intermediate
spatial distribution, and propose a novel cross-modal knowledge distillation
(CMKD) method for better supervised cross-modal learning (CML) performance. The
proposed method enforces output features to be channel-wise independent and
intermediate ones to be uniformly distributed, thereby learning semantically
irrelevant features from the hard task to boost its accuracy. This is
especially useful in specific applications where the performance gap between
dual modalities is relatively large. Furthermore, we collect a real-world CML
dataset to promote community development. The dataset contains more than 10,000
paired optical and radar images and is continuously being updated. Experimental
results on real-world and benchmark datasets validate the effectiveness of the
proposed method.
Related papers
- DAAL: Density-Aware Adaptive Line Margin Loss for Multi-Modal Deep Metric Learning [1.9472493183927981]
We propose a novel loss function called Density-Aware Adaptive Margin Loss (DAAL)
DAAL preserves the density distribution of embeddings while encouraging the formation of adaptive sub-clusters within each class.
Experiments on benchmark fine-grained datasets demonstrate the superior performance of DAAL.
arXiv Detail & Related papers (2024-10-07T19:04:24Z) - SGW-based Multi-Task Learning in Vision Tasks [8.459976488960269]
As the scale of datasets expands and the complexity of tasks increases, knowledge sharing becomes increasingly challenging.
We propose an information bottleneck knowledge extraction module (KEM)
This module aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity.
arXiv Detail & Related papers (2024-10-03T13:56:50Z) - Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory [53.37473225728298]
The rapid evolution of deep learning and large language models has led to an exponential growth in the demand for training data.
Matching Training Trajectories (MTT) has been a prominent approach, which replicates the training trajectory of an expert network on real data with a synthetic dataset.
We introduce a novel method called Matching Convexified Trajectory (MCT), which aims to provide better guidance for the student trajectory.
arXiv Detail & Related papers (2024-06-28T11:06:46Z) - A Generalization Theory of Cross-Modality Distillation with Contrastive Learning [49.35244441141323]
Cross-modality distillation arises as an important topic for data modalities containing limited knowledge.
We formulate a general framework of cross-modality contrastive distillation (CMCD), built upon contrastive learning.
Our algorithm outperforms existing algorithms consistently by a margin of 2-3% across diverse modalities and tasks.
arXiv Detail & Related papers (2024-05-06T11:05:13Z) - Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching.
An anchor branch is first trained to provide insights into the data properties.
A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z) - Hyperspectral Image Analysis in Single-Modal and Multimodal setting
using Deep Learning Techniques [1.2328446298523066]
Hyperspectral imaging provides precise classification for land use and cover due to its exceptional spectral resolution.
However, the challenges of high dimensionality and limited spatial resolution hinder its effectiveness.
This study addresses these challenges by employing deep learning techniques to efficiently process, extract features, and classify data in an integrated manner.
arXiv Detail & Related papers (2024-03-03T15:47:43Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - CLIP-Driven Fine-grained Text-Image Person Re-identification [50.94827165464813]
TIReID aims to retrieve the image corresponding to the given text query from a pool of candidate images.
We propose a CLIP-driven Fine-grained information excavation framework (CFine) to fully utilize the powerful knowledge of CLIP for TIReID.
arXiv Detail & Related papers (2022-10-19T03:43:12Z) - Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer [53.413305467674434]
We introduce open-source RGB data to support spike depth estimation, leveraging its annotations and spatial information.
We propose a cross-modality cross-domain (BiCross) framework to realize unsupervised spike depth estimation.
Our method achieves state-of-the-art (SOTA) performances, compared with RGB-oriented unsupervised depth estimation methods.
arXiv Detail & Related papers (2022-08-26T09:35:20Z) - CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.