Metric-based multimodal meta-learning for human movement identification
via footstep recognition
- URL: http://arxiv.org/abs/2111.07979v1
- Date: Mon, 15 Nov 2021 18:46:14 GMT
- Title: Metric-based multimodal meta-learning for human movement identification
via footstep recognition
- Authors: Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai
- Abstract summary: We describe a novel metric-based learning approach that introduces a multimodal framework.
We learn general-purpose representations from low multisensory data obtained from omnipresent sensing systems.
Our results employ a metric-based contrastive learning approach for multi-sensor data to mitigate the impact of data scarcity.
- Score: 3.300376360949452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe a novel metric-based learning approach that introduces a
multimodal framework and uses deep audio and geophone encoders in siamese
configuration to design an adaptable and lightweight supervised model. This
framework eliminates the need for expensive data labeling procedures and learns
general-purpose representations from low multisensory data obtained from
omnipresent sensing systems. These sensing systems provide numerous
applications and various use cases in activity recognition tasks. Here, we
intend to explore the human footstep movements from indoor environments and
analyze representations from a small self-collected dataset of acoustic and
vibration-based sensors. The core idea is to learn plausible similarities
between two sensory traits and combining representations from audio and
geophone signals. We present a generalized framework to learn embeddings from
temporal and spatial features extracted from audio and geophone signals. We
then extract the representations in a shared space to maximize the learning of
a compatibility function between acoustic and geophone features. This, in turn,
can be used effectively to carry out a classification task from the learned
model, as demonstrated by assigning high similarity to the pairs with a human
footstep movement and lower similarity to pairs containing no footstep
movement. Performance analyses show that our proposed multimodal framework
achieves a 19.99\% accuracy increase (in absolute terms) and avoided
overfitting on the evaluation set when the training samples were increased from
200 pairs to just 500 pairs while satisfactorily learning the audio and
geophone representations. Our results employ a metric-based contrastive
learning approach for multi-sensor data to mitigate the impact of data scarcity
and perform human movement identification with limited data size.
Related papers
- Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method.
It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency.
Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Similarity Embedding Networks for Robust Human Activity Recognition [19.162857787656247]
We design a similarity embedding neural network that maps input sensor signals onto real vectors through carefully designed convolutional and LSTM layers.
The embedding network is trained with a pairwise similarity loss, encouraging the clustering of samples from the same class in the embedded real space.
Extensive evaluation based on two public datasets has shown that the proposed similarity embedding network significantly outperforms state-of-the-art deep models on HAR classification tasks.
arXiv Detail & Related papers (2021-05-31T11:52:32Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Gesture Recognition from Skeleton Data for Intuitive Human-Machine
Interaction [0.6875312133832077]
We propose an approach for segmentation and classification of dynamic gestures based on a set of handcrafted features.
The method for gesture recognition applies a sliding window, which extracts information from both the spatial and temporal dimensions.
At the end, the recognized gestures are used to interact with a collaborative robot.
arXiv Detail & Related papers (2020-08-26T11:28:50Z) - Unsupervised Learning of Audio Perception for Robotics Applications:
Learning to Project Data to T-SNE/UMAP space [2.8935588665357077]
This paper builds upon key ideas to build perception of touch sounds without access to any ground-truth data.
We show how we can leverage ideas from classical signal processing to get large amounts of data of any sound of interest with a high precision.
arXiv Detail & Related papers (2020-02-10T20:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.