Gimme Signals: Discriminative signal encoding for multimodal activity
recognition
- URL: http://arxiv.org/abs/2003.06156v2
- Date: Thu, 9 Apr 2020 13:10:04 GMT
- Title: Gimme Signals: Discriminative signal encoding for multimodal activity
recognition
- Authors: Raphael Memmesheimer, Nick Theisen, Dietrich Paulus
- Abstract summary: We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities.
We apply our method to 4 action recognition datasets containing skeleton sequences, inertial and motion capturing measurements as well as wifi fingerprints that range up to 120 action classes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a simple, yet effective and flexible method for action recognition
supporting multiple sensor modalities. Multivariate signal sequences are
encoded in an image and are then classified using a recently proposed
EfficientNet CNN architecture. Our focus was to find an approach that
generalizes well across different sensor modalities without specific adaptions
while still achieving good results. We apply our method to 4 action recognition
datasets containing skeleton sequences, inertial and motion capturing
measurements as well as \wifi fingerprints that range up to 120 action classes.
Our method defines the current best CNN-based approach on the NTU RGB+D 120
dataset, lifts the state of the art on the ARIL Wi-Fi dataset by +6.78%,
improves the UTD-MHAD inertial baseline by +14.4%, the UTD-MHAD skeleton
baseline by 1.13% and achieves 96.11% on the Simitate motion capturing data
(80/20 split). We further demonstrate experiments on both, modality fusion on a
signal level and signal reduction to prevent the representation from
overloading.
Related papers
- EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition [0.0]
We present an efficient pose-driven attention-guided multimodal action recognition (EPAM-Net) for action recognition in videos.
Specifically, we adapted X3D networks for both pose streams and network-temporal features from RGB videos and their skeleton sequences.
Our model provides a 6.2-9.9-x reduction in FLOPs (floating-point operation, in number of multiply-adds) and a 9--9.6x reduction in the number of network parameters.
arXiv Detail & Related papers (2024-08-10T03:15:24Z) - DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial
Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment.
Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images.
This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z) - Improved Static Hand Gesture Classification on Deep Convolutional Neural
Networks using Novel Sterile Training Technique [2.534406146337704]
Non-contact hand pose and static gesture recognition have received considerable attention in many applications.
This article presents an efficient data collection approach and a novel technique for deep CNN training by introducing sterile'' images.
Applying the proposed data collection and training methods yields an increase in classification rate of static hand gestures from $85%$ to $93%$.
arXiv Detail & Related papers (2023-05-03T11:10:50Z) - PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action
Recognition [52.78234467516168]
We introduce the concept of patch mutual information (PMI) score to quantify the motion bias between adjacent frames.
We present an adaptive frame selection strategy using shifted leaky ReLu and cumulative distribution function.
Our method achieves a relative improvement of 2.2 - 13.8% in top-1 accuracy on UAV-Human, 6.8% on NEC Drone, and 9.0% on Diving48 datasets.
arXiv Detail & Related papers (2023-04-14T00:01:11Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - A Novel Approach For Analysis of Distributed Acoustic Sensing System
Based on Deep Transfer Learning [0.0]
Convolutional neural networks are highly capable tools for extracting spatial information.
Long-short term memory (LSTM) is an effective instrument for processing sequential data.
VGG-16 architecture in our framework manages to obtain 100% classification accuracy in 50 trainings.
arXiv Detail & Related papers (2022-06-24T19:56:01Z) - Meta-Learning Sparse Implicit Neural Representations [69.15490627853629]
Implicit neural representations are a promising new avenue of representing general signals.
Current approach is difficult to scale for a large number of signals or a data set.
We show that meta-learned sparse neural representations achieve a much smaller loss than dense meta-learned models.
arXiv Detail & Related papers (2021-10-27T18:02:53Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action
Recognition [0.0]
We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space.
We encode signals into images and extract features using a deep residual CNN.
The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions.
arXiv Detail & Related papers (2020-04-23T11:28:27Z) - Attentive CutMix: An Enhanced Data Augmentation Approach for Deep
Learning Based Image Classification [58.20132466198622]
We propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix.
In each training iteration, we choose the most descriptive regions based on the intermediate attention maps from a feature extractor.
Our proposed method is simple yet effective, easy to implement and can boost the baseline significantly.
arXiv Detail & Related papers (2020-03-29T15:01:05Z) - Multimodal Affective States Recognition Based on Multiscale CNNs and
Biologically Inspired Decision Fusion Model [9.006757372508366]
multimodal physiological signals-based affective states recognition methods have not been thoroughly exploited yet.
We propose Multiscale Convolutional Neural Networks (Multiscale CNNs) and a biologically inspired decision fusion model for affective states recognition.
The results show that the fusion model improves the accuracy of affective states recognition significantly compared with the result on single-modality signals.
arXiv Detail & Related papers (2019-11-29T01:35:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.