CNN based Multistage Gated Average Fusion (MGAF) for Human Action
Recognition Using Depth and Inertial Sensors
- URL: http://arxiv.org/abs/2010.16073v1
- Date: Thu, 29 Oct 2020 11:49:13 GMT
- Title: CNN based Multistage Gated Average Fusion (MGAF) for Human Action
Recognition Using Depth and Inertial Sensors
- Authors: Zeeshan Ahmad and Naimul khan
- Abstract summary: Convolutional Neural Network (CNN) provides leverage to extract and fuse features from all layers of its architecture.
We propose novel Multistage Gated Average Fusion (MGAF) network which extracts and fuses features from all layers of CNN.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional Neural Network (CNN) provides leverage to extract and fuse
features from all layers of its architecture. However, extracting and fusing
intermediate features from different layers of CNN structure is still
uninvestigated for Human Action Recognition (HAR) using depth and inertial
sensors. To get maximum benefit of accessing all the CNN's layers, in this
paper, we propose novel Multistage Gated Average Fusion (MGAF) network which
extracts and fuses features from all layers of CNN using our novel and
computationally efficient Gated Average Fusion (GAF) network, a decisive
integral element of MGAF. At the input of the proposed MGAF, we transform the
depth and inertial sensor data into depth images called sequential front view
images (SFI) and signal images (SI) respectively. These SFI are formed from the
front view information generated by depth data. CNN is employed to extract
feature maps from both input modalities. GAF network fuses the extracted
features effectively while preserving the dimensionality of fused feature as
well. The proposed MGAF network has structural extensibility and can be
unfolded to more than two modalities. Experiments on three publicly available
multimodal HAR datasets demonstrate that the proposed MGAF outperforms the
previous state of the art fusion methods for depth-inertial HAR in terms of
recognition accuracy while being computationally much more efficient. We
increase the accuracy by an average of 1.5 percent while reducing the
computational cost by approximately 50 percent over the previous state of the
art.
Related papers
- Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - Learning a Graph Neural Network with Cross Modality Interaction for
Image Fusion [23.296468921842948]
Infrared and visible image fusion has gradually proved to be a vital fork in the field of multi-modality imaging technologies.
We propose an interactive graph neural network (GNN)-based architecture between cross modality for fusion, called IGNet.
Our IGNet can generate visually appealing fused images while scoring averagely 2.59% mAP@.5 and 7.77% mIoU higher in detection and segmentation.
arXiv Detail & Related papers (2023-08-07T02:25:06Z) - Semantic Labeling of High Resolution Images Using EfficientUNets and
Transformers [5.177947445379688]
We propose a new segmentation model that combines convolutional neural networks with deep transformers.
Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
arXiv Detail & Related papers (2022-06-20T12:03:54Z) - New SAR target recognition based on YOLO and very deep multi-canonical
correlation analysis [0.1503974529275767]
This paper proposes a robust feature extraction method for SAR image target classification by adaptively fusing effective features from different CNN layers.
Experiments on the MSTAR dataset demonstrate that the proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T18:10:26Z) - Perception-aware Multi-sensor Fusion for 3D LiDAR Semantic Segmentation [59.42262859654698]
3D semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics.
Existing fusion-based methods may not achieve promising performance due to vast difference between two modalities.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Inertial Sensor Data To Image Encoding For Human Action Recognition [0.0]
Convolutional Neural Networks (CNNs) are successful deep learning models in the field of computer vision.
In this paper, we use 4 types of spatial domain methods for transforming inertial sensor data to activity images.
For creating a multimodal fusion framework, we made each type of activity images multimodal by convolving with two spatial domain filters.
arXiv Detail & Related papers (2021-05-28T01:22:52Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Efficient Human Pose Estimation by Learning Deeply Aggregated
Representations [67.24496300046255]
We propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations.
Our networks could achieve comparable or even better accuracy with much smaller model complexity.
arXiv Detail & Related papers (2020-12-13T10:58:07Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Towards Improved Human Action Recognition Using Convolutional Neural
Networks and Multimodal Fusion of Depth and Inertial Sensor Data [1.52292571922932]
This paper attempts at improving the accuracy of Human Action Recognition (HAR) by fusion of depth and inertial sensor data.
We transform the depth data into Sequential Front view Images(SFI) and fine-tune the pre-trained AlexNet on these images.
Inertial data is converted into Signal Images (SI) and another convolutional neural network (CNN) is trained on these images.
arXiv Detail & Related papers (2020-08-22T03:41:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.