Cross-modal Knowledge Distillation for Vision-to-Sensor Action
Recognition
- URL: http://arxiv.org/abs/2112.01849v1
- Date: Fri, 8 Oct 2021 15:06:38 GMT
- Title: Cross-modal Knowledge Distillation for Vision-to-Sensor Action
Recognition
- Authors: Jianyuan Ni, Raunak Sarbajna, Yang Liu, Anne H.H. Ngu and Yan Yan
- Abstract summary: This study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework.
In this VSKD framework, only time-series data, i.e., accelerometer data, is needed from wearable devices during the testing phase.
This framework will not only reduce the computational demands on edge devices, but also produce a learning model that closely matches the performance of the computational expensive multi-modal approach.
- Score: 12.682984063354748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human activity recognition (HAR) based on multi-modal approach has been
recently shown to improve the accuracy performance of HAR. However, restricted
computational resources associated with wearable devices, i.e., smartwatch,
failed to directly support such advanced methods. To tackle this issue, this
study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD)
framework. In this VSKD framework, only time-series data, i.e., accelerometer
data, is needed from wearable devices during the testing phase. Therefore, this
framework will not only reduce the computational demands on edge devices, but
also produce a learning model that closely matches the performance of the
computational expensive multi-modal approach. In order to retain the local
temporal relationship and facilitate visual deep learning models, we first
convert time-series data to two-dimensional images by applying the Gramian
Angular Field ( GAF) based encoding method. We adopted ResNet18 and multi-scale
TRN with BN-Inception as teacher and student network in this study,
respectively. A novel loss function, named Distance and Angle-wised Semantic
Knowledge loss (DASK), is proposed to mitigate the modality variations between
the vision and the sensor domain. Extensive experimental results on UTD-MHAD,
MMAct, and Berkeley-MHAD datasets demonstrate the effectiveness and
competitiveness of the proposed VSKD model which can deployed on wearable
sensors.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data [15.326571438985466]
topological features obtained by topological data analysis (TDA) have been suggested as a potential solution.
There are two significant obstacles to using topological features in deep learning.
We propose to use two teacher networks, one trained on the raw time-series data, and another trained on persistence images generated by TDA methods.
A robust student model is distilled, which uses only the time-series data as an input, while implicitly preserving topological features.
arXiv Detail & Related papers (2024-07-07T10:08:34Z) - Simple 2D Convolutional Neural Network-based Approach for COVID-19 Detection [8.215897530386343]
This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images.
We propose an advanced Spatial-Slice Feature Learning (SSFL++) framework specifically tailored for CT scans.
It aims to filter out out out-of-distribution (OOD) data within the entire CT scan, allowing us to select essential spatial-slice features for analysis by reducing data redundancy by 70%.
arXiv Detail & Related papers (2024-03-17T14:34:51Z) - Efficient Adaptive Human-Object Interaction Detection with
Concept-guided Memory [64.11870454160614]
We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM)
ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm.
Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
arXiv Detail & Related papers (2023-09-07T13:10:06Z) - Progressive Cross-modal Knowledge Distillation for Human Action
Recognition [10.269019492921306]
We propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model for solving the wearable sensor-based HAR problem.
Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities.
arXiv Detail & Related papers (2022-08-17T06:06:03Z) - Evaluation and Comparison of Deep Learning Methods for Pavement Crack
Identification with Visual Images [0.0]
pavement crack identification with visual images via deep learning algorithms has the advantages of not being limited by the material of object to be detected.
In the aspect of patch sample classification, the fine-tuned TL models can be equivalent to or even slightly better than the ED models in accuracy.
In the aspect of accurate crack location, both ED and GAN algorithms can achieve pixel-level segmentation and is expected to be detected in real time on low computing power platform.
arXiv Detail & Related papers (2021-12-20T08:23:43Z) - EvDistill: Asynchronous Events to End-task Learning via Bidirectional
Reconstruction-guided Cross-modal Knowledge Distillation [61.33010904301476]
Event cameras sense per-pixel intensity changes and produce asynchronous event streams with high dynamic range and less motion blur.
We propose a novel approach, called bfEvDistill, to learn a student network on the unlabeled and unpaired event data.
We show that EvDistill achieves significantly better results than the prior works and KD with only events and APS frames.
arXiv Detail & Related papers (2021-11-24T08:48:16Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Modality Compensation Network: Cross-Modal Adaptation for Action
Recognition [77.24983234113957]
We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities.
Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning.
Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
arXiv Detail & Related papers (2020-01-31T04:51:55Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.