Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition
- URL: http://arxiv.org/abs/2009.00210v5
- Date: Thu, 27 May 2021 07:16:45 GMT
- Title: Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition
- Authors: Yang Liu, Keze Wang, Guanbin Li, Liang Lin
- Abstract summary: We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
- Score: 131.6328804788164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing vision-based action recognition is susceptible to occlusion and
appearance variations, while wearable sensors can alleviate these challenges by
capturing human motion with one-dimensional time-series signal. For the same
action, the knowledge learned from vision sensors and wearable sensors, may be
related and complementary. However, there exists significantly large modality
difference between action data captured by wearable-sensor and vision-sensor in
data dimension, data distribution and inherent information content. In this
paper, we propose a novel framework, named Semantics-aware Adaptive Knowledge
Distillation Networks (SAKDN), to enhance action recognition in vision-sensor
modality (videos) by adaptively transferring and distilling the knowledge from
multiple wearable sensors. The SAKDN uses multiple wearable-sensors as teacher
modalities and uses RGB videos as student modality. To preserve local temporal
relationship and facilitate employing visual deep learning model, we transform
one-dimensional time-series signals of wearable sensors to two-dimensional
images by designing a gramian angular field based virtual image generation
model. Then, we build a novel Similarity-Preserving Adaptive Multi-modal Fusion
Module to adaptively fuse intermediate representation knowledge from different
teacher networks. Finally, to fully exploit and transfer the knowledge of
multiple well-trained teacher networks to the student network, we propose a
novel Graph-guided Semantically Discriminative Mapping loss, which utilizes
graph-guided ablation analysis to produce a good visual explanation
highlighting the important regions across modalities and concurrently
preserving the interrelations of original data. Experimental results on
Berkeley-MHAD, UTD-MHAD and MMAct datasets well demonstrate the effectiveness
of our proposed SAKDN.
Related papers
- TASKED: Transformer-based Adversarial learning for human activity
recognition using wearable sensors via Self-KnowledgE Distillation [6.458496335718508]
We propose a novel Transformer-based Adversarial learning framework for human activity recognition using wearable sensors via Self-KnowledgE Distillation (TASKED)
In the proposed method, we adopt the teacher-free self-knowledge distillation to improve the stability of the training procedure and the performance of human activity recognition.
arXiv Detail & Related papers (2022-09-14T11:08:48Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation
for Scene Recognition [64.29650787243443]
We propose and analyse the use of a 2D frequency transform of the activation maps before transferring them.
This strategy enhances knowledge transferability in tasks such as scene recognition.
We publicly release the training and evaluation framework used along this paper at http://www.vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
arXiv Detail & Related papers (2022-05-04T11:05:18Z) - Deep Transfer Learning with Graph Neural Network for Sensor-Based Human
Activity Recognition [12.51766929898714]
We devised a graph-inspired deep learning approach toward the sensor-based HAR tasks.
We present a multi-layer residual structure involved graph convolutional neural network (ResGCNN) toward the sensor-based HAR tasks.
Experimental results on the PAMAP2 and mHealth data sets demonstrate that our ResGCNN is effective at capturing the characteristics of actions.
arXiv Detail & Related papers (2022-03-14T07:57:32Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Cross-modal Knowledge Distillation for Vision-to-Sensor Action
Recognition [12.682984063354748]
This study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework.
In this VSKD framework, only time-series data, i.e., accelerometer data, is needed from wearable devices during the testing phase.
This framework will not only reduce the computational demands on edge devices, but also produce a learning model that closely matches the performance of the computational expensive multi-modal approach.
arXiv Detail & Related papers (2021-10-08T15:06:38Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for
Autonomous Driving [6.810856082577402]
We have proposed a deep neural network Self Supervised Thermal Network (SSTN) for learning the feature embedding to maximize the information between visible and infrared spectrum domain by contrastive learning.
The proposed method is extensively evaluated on the two publicly available datasets: the FLIR-ADAS dataset and the KAIST Multi-Spectral dataset.
arXiv Detail & Related papers (2021-03-04T16:42:49Z) - Visual Relationship Detection with Visual-Linguistic Knowledge from
Multimodal Representations [103.00383924074585]
Visual relationship detection aims to reason over relationships among salient objects in images.
We propose a novel approach named Visual-Linguistic Representations from Transformers (RVL-BERT)
RVL-BERT performs spatial reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training.
arXiv Detail & Related papers (2020-09-10T16:15:09Z) - A Framework for Learning Invariant Physical Relations in Multimodal
Sensory Processing [0.0]
We design a novel neural network architecture capable of learning, in an unsupervised manner, relations among sensory cues.
We describe the core system functionality when learning arbitrary non-linear relations in low-dimensional sensory data.
We demonstrate this through a real-world learning problem, where, from standard RGB camera frames, the network learns the relations between physical quantities.
arXiv Detail & Related papers (2020-06-30T08:42:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.