Towards Improved Human Action Recognition Using Convolutional Neural
Networks and Multimodal Fusion of Depth and Inertial Sensor Data
- URL: http://arxiv.org/abs/2008.09747v1
- Date: Sat, 22 Aug 2020 03:41:34 GMT
- Title: Towards Improved Human Action Recognition Using Convolutional Neural
Networks and Multimodal Fusion of Depth and Inertial Sensor Data
- Authors: Zeeshan Ahmad and Naimul Khan
- Abstract summary: This paper attempts at improving the accuracy of Human Action Recognition (HAR) by fusion of depth and inertial sensor data.
We transform the depth data into Sequential Front view Images(SFI) and fine-tune the pre-trained AlexNet on these images.
Inertial data is converted into Signal Images (SI) and another convolutional neural network (CNN) is trained on these images.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper attempts at improving the accuracy of Human Action Recognition
(HAR) by fusion of depth and inertial sensor data. Firstly, we transform the
depth data into Sequential Front view Images(SFI) and fine-tune the pre-trained
AlexNet on these images. Then, inertial data is converted into Signal Images
(SI) and another convolutional neural network (CNN) is trained on these images.
Finally, learned features are extracted from both CNN, fused together to make a
shared feature layer, and these features are fed to the classifier. We
experiment with two classifiers, namely Support Vector Machines (SVM) and
softmax classifier and compare their performances. The recognition accuracies
of each modality, depth data alone and sensor data alone are also calculated
and compared with fusion based accuracies to highlight the fact that fusion of
modalities yields better results than individual modalities. Experimental
results on UTD-MHAD and Kinect 2D datasets show that proposed method achieves
state of the art results when compared to other recently proposed
visual-inertial action recognition methods.
Related papers
- Research on Image Recognition Technology Based on Multimodal Deep Learning [24.259653149898167]
This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks.
The performance of the suggested algorithm was evaluated using the MSR3D data set.
arXiv Detail & Related papers (2024-05-06T01:05:21Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Inertial Sensor Data To Image Encoding For Human Action Recognition [0.0]
Convolutional Neural Networks (CNNs) are successful deep learning models in the field of computer vision.
In this paper, we use 4 types of spatial domain methods for transforming inertial sensor data to activity images.
For creating a multimodal fusion framework, we made each type of activity images multimodal by convolving with two spatial domain filters.
arXiv Detail & Related papers (2021-05-28T01:22:52Z) - ScalingNet: extracting features from raw EEG data for emotion
recognition [4.047737925426405]
We propose a novel convolutional layer allowing to adaptively extract effective data-driven spectrogram-like features from raw EEG signals.
The proposed neural network architecture based on the scaling layer, references as ScalingNet, has achieved the state-of-the-art result across the established DEAP benchmark dataset.
arXiv Detail & Related papers (2021-02-07T08:54:27Z) - A Novel Multi-Stage Training Approach for Human Activity Recognition
from Multimodal Wearable Sensor Data Using Deep Neural Network [11.946078871080836]
Deep neural network is an effective choice to automatically recognize human actions utilizing data from various wearable sensors.
In this paper, we have proposed a novel multi-stage training approach that increases diversity in this feature extraction process.
arXiv Detail & Related papers (2021-01-03T20:48:56Z) - CNN based Multistage Gated Average Fusion (MGAF) for Human Action
Recognition Using Depth and Inertial Sensors [1.52292571922932]
Convolutional Neural Network (CNN) provides leverage to extract and fuse features from all layers of its architecture.
We propose novel Multistage Gated Average Fusion (MGAF) network which extracts and fuses features from all layers of CNN.
arXiv Detail & Related papers (2020-10-29T11:49:13Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition [55.15661254072032]
We present a sparsity-aware deep network for automatic 4D facial expression recognition (FER)
We first propose a novel augmentation method to combat the data limitation problem for deep learning.
We then present a sparsity-aware deep network to compute the sparse representations of convolutional features over multi-views.
arXiv Detail & Related papers (2020-02-08T13:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.