CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation
- URL: http://arxiv.org/abs/2208.12448v3
- Date: Thu, 25 May 2023 14:19:43 GMT
- Title: CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation
- Authors: Yunyao Mao, Wengang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li
- Abstract summary: In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
- Score: 130.08432609780374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In 3D action recognition, there exists rich complementary information between
skeleton modalities. Nevertheless, how to model and utilize this information
remains a challenging problem for self-supervised 3D action representation
learning. In this work, we formulate the cross-modal interaction as a
bidirectional knowledge distillation problem. Different from classic
distillation solutions that transfer the knowledge of a fixed and pre-trained
teacher to the student, in this work, the knowledge is continuously updated and
bidirectionally distilled between modalities. To this end, we propose a new
Cross-modal Mutual Distillation (CMD) framework with the following designs. On
the one hand, the neighboring similarity distribution is introduced to model
the knowledge learned in each modality, where the relational information is
naturally suitable for the contrastive frameworks. On the other hand,
asymmetrical configurations are used for teacher and student to stabilize the
distillation process and to transfer high-confidence information between
modalities. By derivation, we find that the cross-modal positive mining in
previous works can be regarded as a degenerated version of our CMD. We perform
extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets.
Our approach outperforms existing self-supervised methods and sets a series of
new records. The code is available at: https://github.com/maoyunyao/CMD
Related papers
- DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning [3.763772992906958]
Cross-modal knowledge distillation (CMKD) refers to the scenario in which a learning framework must handle training and test data that exhibit a modality mismatch.
DisCoM-KD (Disentanglement-learning based Cross-Modal Knowledge Distillation) explicitly models different types of per-modality information.
arXiv Detail & Related papers (2024-08-05T13:44:15Z) - Learning to Maximize Mutual Information for Chain-of-Thought Distillation [13.660167848386806]
Distilling Step-by-Step(DSS) has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts.
However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction.
We propose a variational approach to solve this problem using a learning-based method.
arXiv Detail & Related papers (2024-03-05T22:21:45Z) - I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal
Mutual Distillation [147.2183428328396]
We introduce a general Inter- and Intra-modal Mutual Distillation (I$2$MD) framework.
In I$2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process.
To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy.
arXiv Detail & Related papers (2023-10-24T07:22:17Z) - Elevating Skeleton-Based Action Recognition with Efficient
Multi-Modality Self-Supervision [40.16465314639641]
Self-supervised representation learning for human action recognition has developed rapidly in recent years.
Most of the existing works are based on skeleton data while using a multi-modality setup.
We first propose an Implicit Knowledge Exchange Module which alleviates the propagation of erroneous knowledge between low-performance modalities.
arXiv Detail & Related papers (2023-09-21T12:27:43Z) - Lightweight Self-Knowledge Distillation with Multi-source Information
Fusion [3.107478665474057]
Knowledge Distillation (KD) is a powerful technique for transferring knowledge between neural network models.
We propose a lightweight SKD framework that utilizes multi-source information to construct a more informative teacher.
We validate the performance of the proposed DRG, DSR, and their combination through comprehensive experiments on various datasets and models.
arXiv Detail & Related papers (2023-05-16T05:46:31Z) - SimDistill: Simulated Multi-modal Distillation for BEV 3D Object
Detection [56.24700754048067]
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging.
We propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy.
Our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment.
arXiv Detail & Related papers (2023-03-29T16:08:59Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - A Unified Continuous Learning Framework for Multi-modal Knowledge
Discovery and Pre-training [73.7507857547549]
We propose to unify knowledge discovery and multi-modal pre-training in a continuous learning framework.
For knowledge discovery, a pre-trained model is used to identify cross-modal links on a graph.
For model pre-training, the knowledge graph is used as the external knowledge to guide the model updating.
arXiv Detail & Related papers (2022-06-11T16:05:06Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.