Elevating Skeleton-Based Action Recognition with Efficient
Multi-Modality Self-Supervision
- URL: http://arxiv.org/abs/2309.12009v2
- Date: Thu, 11 Jan 2024 02:44:27 GMT
- Title: Elevating Skeleton-Based Action Recognition with Efficient
Multi-Modality Self-Supervision
- Authors: Yiping Wei, Kunyu Peng, Alina Roitberg, Jiaming Zhang, Junwei Zheng,
Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen
- Abstract summary: Self-supervised representation learning for human action recognition has developed rapidly in recent years.
Most of the existing works are based on skeleton data while using a multi-modality setup.
We first propose an Implicit Knowledge Exchange Module which alleviates the propagation of erroneous knowledge between low-performance modalities.
- Score: 40.16465314639641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised representation learning for human action recognition has
developed rapidly in recent years. Most of the existing works are based on
skeleton data while using a multi-modality setup. These works overlooked the
differences in performance among modalities, which led to the propagation of
erroneous knowledge between modalities while only three fundamental modalities,
i.e., joints, bones, and motions are used, hence no additional modalities are
explored.
In this work, we first propose an Implicit Knowledge Exchange Module (IKEM)
which alleviates the propagation of erroneous knowledge between low-performance
modalities. Then, we further propose three new modalities to enrich the
complementary information between modalities. Finally, to maintain efficiency
when introducing new modalities, we propose a novel teacher-student framework
to distill the knowledge from the secondary modalities into the mandatory
modalities considering the relationship constrained by anchors, positives, and
negatives, named relational cross-modality knowledge distillation. The
experimental results demonstrate the effectiveness of our approach, unlocking
the efficient use of skeleton-based multi-modality data. Source code will be
made publicly available at https://github.com/desehuileng0o0/IKEM.
Related papers
- ReconBoost: Boosting Can Achieve Modality Reconcilement [89.4377895465204]
We study the modality-alternating learning paradigm to achieve reconcilement.
We propose a new method called ReconBoost to update a fixed modality each time.
We show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others.
arXiv Detail & Related papers (2024-05-15T13:22:39Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z) - Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal
Human Activity Recognition [1.869225486385596]
We explore the hypothesis that leveraging multiple modalities can lead to better recognition.
We extend a number of recent contrastive self-supervised approaches for the task of Human Activity Recognition.
We propose a flexible, general-purpose framework for performing multimodal self-supervised learning.
arXiv Detail & Related papers (2022-05-20T10:39:16Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.