SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action
Recognition
- URL: http://arxiv.org/abs/2004.11085v4
- Date: Mon, 19 Oct 2020 13:16:59 GMT
- Title: SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action
Recognition
- Authors: Raphael Memmesheimer, Nick Theisen, Dietrich Paulus
- Abstract summary: We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space.
We encode signals into images and extract features using a deep residual CNN.
The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recognizing an activity with a single reference sample using metric learning
approaches is a promising research field. The majority of few-shot methods
focus on object recognition or face-identification. We propose a metric
learning approach to reduce the action recognition problem to a nearest
neighbor search in embedding space. We encode signals into images and extract
features using a deep residual CNN. Using triplet loss, we learn a feature
embedding. The resulting encoder transforms features into an embedding space in
which closer distances encode similar actions while higher distances encode
different actions. Our approach is based on a signal level formulation and
remains flexible across a variety of modalities. It further outperforms the
baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action
recognition protocol by 5.6%. With just 60% of the training data, our approach
still outperforms the baseline approach by 3.7%. With 40% of the training data,
our approach performs comparably well to the second follow up. Further, we show
that our approach generalizes well in experiments on the UTD-MHAD dataset for
inertial, skeleton and fused data and the Simitate dataset for motion capturing
data. Furthermore, our inter-joint and inter-sensor experiments suggest good
capabilities on previously unseen setups.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object
Detection [15.885344033374393]
We propose ActiveAnno3D, an active learning framework to select data samples for labeling.
We perform experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset.
We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling.
arXiv Detail & Related papers (2024-02-05T17:52:58Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary
Data [100.33096338195723]
We focus on Few-shot Learning with Auxiliary Data (FLAD)
FLAD assumes access to auxiliary data during few-shot learning in hopes of improving generalization.
We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit.
arXiv Detail & Related papers (2023-02-01T18:59:36Z) - A Novel Multi-Stage Training Approach for Human Activity Recognition
from Multimodal Wearable Sensor Data Using Deep Neural Network [11.946078871080836]
Deep neural network is an effective choice to automatically recognize human actions utilizing data from various wearable sensors.
In this paper, we have proposed a novel multi-stage training approach that increases diversity in this feature extraction process.
arXiv Detail & Related papers (2021-01-03T20:48:56Z) - Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation [66.03023110058464]
We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-07-03T08:29:35Z) - Gimme Signals: Discriminative signal encoding for multimodal activity
recognition [0.0]
We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities.
We apply our method to 4 action recognition datasets containing skeleton sequences, inertial and motion capturing measurements as well as wifi fingerprints that range up to 120 action classes.
arXiv Detail & Related papers (2020-03-13T08:58:15Z) - Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition [55.15661254072032]
We present a sparsity-aware deep network for automatic 4D facial expression recognition (FER)
We first propose a novel augmentation method to combat the data limitation problem for deep learning.
We then present a sparsity-aware deep network to compute the sparse representations of convolutional features over multi-views.
arXiv Detail & Related papers (2020-02-08T13:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.