A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors
Data for Automatic Multimodal Human Activity Recognition System
- URL: http://arxiv.org/abs/2306.15765v1
- Date: Tue, 27 Jun 2023 19:29:35 GMT
- Title: A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors
Data for Automatic Multimodal Human Activity Recognition System
- Authors: Santosh Kumar Yadav, Muhtashim Rafiqi, Egna Praneeth Gummana, Kamlesh
Tiwari, Hari Mohan Pandey, Shaik Ali Akbara
- Abstract summary: This paper presents a novel multimodal human activity recognition system.
It uses a two-stream decision level fusion of vision and inertial sensors.
The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and 95.9 % respectively.
- Score: 2.5214116139219787
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a novel multimodal human activity recognition system. It
uses a two-stream decision level fusion of vision and inertial sensors. In the
first stream, raw RGB frames are passed to a part affinity field-based pose
estimation network to detect the keypoints of the user. These keypoints are
then pre-processed and inputted in a sliding window fashion to a specially
designed convolutional neural network for the spatial feature extraction
followed by regularized LSTMs to calculate the temporal features. The outputs
of LSTM networks are then inputted to fully connected layers for
classification. In the second stream, data obtained from inertial sensors are
pre-processed and inputted to regularized LSTMs for the feature extraction
followed by fully connected layers for the classification. At this stage, the
SoftMax scores of two streams are then fused using the decision level fusion
which gives the final prediction. Extensive experiments are conducted to
evaluate the performance. Four multimodal standard benchmark datasets (UP-Fall
detection, UTD-MHAD, Berkeley-MHAD, and C-MHAD) are used for experimentations.
The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and
95.9 % respectively on the UP-Fall Detection, UTDMHAD, Berkeley-MHAD, and
C-MHAD datasets. These results are far superior than the current
state-of-the-art methods.
Related papers
- Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - A Novel Approach For Analysis of Distributed Acoustic Sensing System
Based on Deep Transfer Learning [0.0]
Convolutional neural networks are highly capable tools for extracting spatial information.
Long-short term memory (LSTM) is an effective instrument for processing sequential data.
VGG-16 architecture in our framework manages to obtain 100% classification accuracy in 50 trainings.
arXiv Detail & Related papers (2022-06-24T19:56:01Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Multiple Time Series Fusion Based on LSTM An Application to CAP A Phase
Classification Using EEG [56.155331323304]
Deep learning based electroencephalogram channels' feature level fusion is carried out in this work.
Channel selection, fusion, and classification procedures were optimized by two optimization algorithms.
arXiv Detail & Related papers (2021-12-18T14:17:49Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point
Clouds [51.47100091540298]
We present Cascaded Primitive Fitting Networks (CPFN) that relies on an adaptive patch sampling network to assemble detection results of global and local primitive detection networks.
CPFN improves the state-of-the-art SPFN performance by 13-14% on high-resolution point cloud datasets and specifically improves the detection of fine-scale primitives by 20-22%.
arXiv Detail & Related papers (2021-08-31T23:27:33Z) - GEM: Glare or Gloom, I Can Still See You -- End-to-End Multimodal Object
Detector [11.161639542268015]
We propose sensor-aware multi-modal fusion strategies for 2D object detection in harsh-lighting conditions.
Our network learns to estimate the measurement reliability of each sensor modality in the form of scalar weights and masks.
We show that the proposed strategies out-perform the existing state-of-the-art methods on the FLIR-Thermal dataset.
arXiv Detail & Related papers (2021-02-24T14:56:37Z) - Dermo-DOCTOR: A web application for detection and recognition of the
skin lesion using a deep convolutional neural network [3.7242808753092502]
This article proposes an end-to-end deep CNN-based multi-task web application for concurrent detection and recognition of skin lesion, named Dermo-DOCTOR.
For the detection sub-network, the Fused Feature Map (FFM) is used for decoding to obtain the input resolution of the output lesion masks.
For the recognition sub-network, feature maps of two encoders and FFM are used for the aggregation to obtain a final lesion class.
arXiv Detail & Related papers (2021-02-03T01:14:52Z) - Towards Improved Human Action Recognition Using Convolutional Neural
Networks and Multimodal Fusion of Depth and Inertial Sensor Data [1.52292571922932]
This paper attempts at improving the accuracy of Human Action Recognition (HAR) by fusion of depth and inertial sensor data.
We transform the depth data into Sequential Front view Images(SFI) and fine-tune the pre-trained AlexNet on these images.
Inertial data is converted into Signal Images (SI) and another convolutional neural network (CNN) is trained on these images.
arXiv Detail & Related papers (2020-08-22T03:41:34Z) - Single-stage intake gesture detection using CTC loss and extended prefix
beam search [8.22379888383833]
Accurate detection of individual intake gestures is a key step towards automatic dietary monitoring.
We propose a single-stage approach which directly decodes the probabilities learned from sensor data into sparse intake detections.
arXiv Detail & Related papers (2020-08-07T06:04:25Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.