Improving Hand Recognition in Uncontrolled and Uncooperative
Environments using Multiple Spatial Transformers and Loss Functions
- URL: http://arxiv.org/abs/2311.05383v1
- Date: Thu, 9 Nov 2023 14:08:48 GMT
- Title: Improving Hand Recognition in Uncontrolled and Uncooperative
Environments using Multiple Spatial Transformers and Loss Functions
- Authors: Wojciech Michal Matkowski, Xiaojie Li and Adams Wai Kin Kong
- Abstract summary: Many existing hand-based recognition methods perform well for hand images collected in controlled environments with user cooperation.
An algorithm integrating a multi-spatial transformer network (MSTN) and multiple loss functions is proposed to fully utilize information in full hand images.
Experimental results show that the proposed algorithm performs significantly better than the existing methods in these uncontrolled and uncooperative environments.
- Score: 13.47664951012019
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prevalence of smartphone and consumer camera has led to more evidence in
the form of digital images, which are mostly taken in uncontrolled and
uncooperative environments. In these images, criminals likely hide or cover
their faces while their hands are observable in some cases, creating a
challenging use case for forensic investigation. Many existing hand-based
recognition methods perform well for hand images collected in controlled
environments with user cooperation. However, their performance deteriorates
significantly in uncontrolled and uncooperative environments. A recent work has
exposed the potential of hand recognition in these environments. However, only
the palmar regions were considered, and the recognition performance is still
far from satisfactory. To improve the recognition accuracy, an algorithm
integrating a multi-spatial transformer network (MSTN) and multiple loss
functions is proposed to fully utilize information in full hand images. MSTN is
firstly employed to localize the palms and fingers and estimate the alignment
parameters. Then, the aligned images are further fed into pretrained
convolutional neural networks, where features are extracted. Finally, a
training scheme with multiple loss functions is used to train the network
end-to-end. To demonstrate the effectiveness of the proposed algorithm, the
trained model is evaluated on NTU-PI-v1 database and six benchmark databases
from different domains. Experimental results show that the proposed algorithm
performs significantly better than the existing methods in these uncontrolled
and uncooperative environments and has good generalization capabilities to
samples from different domains.
Related papers
- Research on Image Recognition Technology Based on Multimodal Deep Learning [24.259653149898167]
This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks.
The performance of the suggested algorithm was evaluated using the MSR3D data set.
arXiv Detail & Related papers (2024-05-06T01:05:21Z) - Multi-channel Time Series Decomposition Network For Generalizable Sensor-Based Activity Recognition [2.024925013349319]
This paper proposes a new method, Multi-channel Time Series Decomposition Network (MTSDNet)
It decomposes the original signal into a combination of multiple components and trigonometric functions by the trainable parameterized temporal decomposition.
It shows the advantages in predicting accuracy and stability of our method compared with other competing strategies.
arXiv Detail & Related papers (2024-03-28T12:54:06Z) - LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for
Place Recognition [11.206532393178385]
We present a novel neural network named LCPR for robust multimodal place recognition.
Our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance.
arXiv Detail & Related papers (2023-11-06T15:39:48Z) - MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential
Deepfake Detection [81.59191603867586]
Sequential deepfake detection aims to identify forged facial regions with the correct sequence for recovery.
The recovery of forged images requires knowledge of the manipulation model to implement inverse transformations.
We propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images.
arXiv Detail & Related papers (2023-07-06T02:32:08Z) - Learning from Multi-Perception Features for Real-Word Image
Super-resolution [87.71135803794519]
We propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images.
Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information.
We also introduce a contrastive regularization term (CR) that improves the model's learning capability.
arXiv Detail & Related papers (2023-05-26T07:35:49Z) - Agile gesture recognition for capacitive sensing devices: adapting
on-the-job [55.40855017016652]
We demonstrate a hand gesture recognition system that uses signals from capacitive sensors embedded into the etee hand controller.
The controller generates real-time signals from each of the wearer five fingers.
We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms.
arXiv Detail & Related papers (2023-05-12T17:24:02Z) - On the Effectiveness of Image Manipulation Detection in the Age of
Social Media [9.227950734832447]
manipulation detection algorithms often rely on the manipulated regions being sufficiently'' different from the rest of the non-tampered regions in the image.
We present an in-depth analysis of deep learning-based and learning-free methods, assessing their performance on benchmark datasets.
We propose a novel deep learning-based pre-processing technique that accentuates the anomalies present in manipulated regions.
arXiv Detail & Related papers (2023-04-19T04:05:54Z) - Unsupervised Domain Transfer with Conditional Invertible Neural Networks [83.90291882730925]
We propose a domain transfer approach based on conditional invertible neural networks (cINNs)
Our method inherently guarantees cycle consistency through its invertible architecture, and network training can efficiently be conducted with maximum likelihood.
Our method enables the generation of realistic spectral data and outperforms the state of the art on two downstream classification tasks.
arXiv Detail & Related papers (2023-03-17T18:00:27Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.