Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal
Human Activity Recognition
- URL: http://arxiv.org/abs/2205.10071v1
- Date: Fri, 20 May 2022 10:39:16 GMT
- Title: Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal
Human Activity Recognition
- Authors: Razvan Brinzea, Bulat Khaertdinov and Stylianos Asteriadis
- Abstract summary: We explore the hypothesis that leveraging multiple modalities can lead to better recognition.
We extend a number of recent contrastive self-supervised approaches for the task of Human Activity Recognition.
We propose a flexible, general-purpose framework for performing multimodal self-supervised learning.
- Score: 1.869225486385596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human Activity Recognition is a field of research where input data can take
many forms. Each of the possible input modalities describes human behaviour in
a different way, and each has its own strengths and weaknesses. We explore the
hypothesis that leveraging multiple modalities can lead to better recognition.
Since manual annotation of input data is expensive and time-consuming, the
emphasis is made on self-supervised methods which can learn useful feature
representations without any ground truth labels. We extend a number of recent
contrastive self-supervised approaches for the task of Human Activity
Recognition, leveraging inertial and skeleton data. Furthermore, we propose a
flexible, general-purpose framework for performing multimodal self-supervised
learning, named Contrastive Multiview Coding with Cross-Modal Knowledge Mining
(CMC-CMKM). This framework exploits modality-specific knowledge in order to
mitigate the limitations of typical self-supervised frameworks. The extensive
experiments on two widely-used datasets demonstrate that the suggested
framework significantly outperforms contrastive unimodal and multimodal
baselines on different scenarios, including fully-supervised fine-tuning,
activity retrieval and semi-supervised learning. Furthermore, it shows
performance competitive even compared to supervised methods.
Related papers
- Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning [23.035725779568587]
We study the role and interactions of multiple modalities in mitigating forgetting in deep neural networks (DNNs)
Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations.
We propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality.
arXiv Detail & Related papers (2024-05-04T22:02:58Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Single-Modal Entropy based Active Learning for Visual Question Answering [75.1682163844354]
We address Active Learning in the multi-modal setting of Visual Question Answering (VQA)
In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition.
Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks.
arXiv Detail & Related papers (2021-10-21T05:38:45Z) - Multi-Modal Mutual Information (MuMMI) Training for Robust
Self-Supervised Deep Reinforcement Learning [13.937546816302715]
This work focuses on learning useful and robust deep world models using multiple, possibly unreliable, sensors.
We contribute a new multi-modal deep latent state-space model, trained using a mutual information lower-bound.
Experiments show our method significantly outperforms state-of-the-art deep reinforcement learning methods.
arXiv Detail & Related papers (2021-07-06T01:39:21Z) - Learning Modality-Specific Representations with Self-Supervised
Multi-Task Learning for Multimodal Sentiment Analysis [11.368438990334397]
We develop a self-supervised learning strategy to acquire independent unimodal supervisions.
We conduct extensive experiments on three public multimodal baseline datasets.
Our method achieves comparable performance than human-annotated unimodal labels.
arXiv Detail & Related papers (2021-02-09T14:05:02Z) - Sense and Learn: Self-Supervision for Omnipresent Sensors [9.442811508809994]
We present a framework named Sense and Learn for representation or feature learning from raw sensory data.
It consists of several auxiliary tasks that can learn high-level and broadly useful features entirely from unannotated data without any human involvement in the tedious labeling process.
Our methodology achieves results that are competitive with the supervised approaches and close the gap through fine-tuning a network while learning the downstream tasks in most cases.
arXiv Detail & Related papers (2020-09-28T11:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.