IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer
- URL: http://arxiv.org/abs/2404.08237v1
- Date: Fri, 12 Apr 2024 04:44:11 GMT
- Title: IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer
- Authors: Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Massimo Tistarelli, Zhe Jin,
- Abstract summary: We propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT)
The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs.
The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching.
- Score: 18.481207354858533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching.
Related papers
- Fixed-Length Dense Fingerprint Representation [32.21219375759034]
We propose a fixed-length dense descriptor of fingerprints, and introduce FLARE-a fingerprint matching framework.<n>The proposed dense descriptor supports fixed-length representation while maintaining spatial correspondence.<n>Experiments demonstrate that FLARE achieves superior performance across rolled, plain, latent, and contactless fingerprints.
arXiv Detail & Related papers (2025-05-06T14:59:25Z) - Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP [53.18562650350898]
We introduce a general framework which can identify the roles of various components in ViTs beyond CLIP.
We also introduce a novel scoring function to rank components by their importance with respect to specific features.
Applying our framework to various ViT variants we gain insights into the roles of different components concerning particular image features.
arXiv Detail & Related papers (2024-06-03T17:58:43Z) - Joint Identity Verification and Pose Alignment for Partial Fingerprints [33.05877729161858]
We propose a novel framework for joint identity verification and pose alignment of partial fingerprint pairs.
Our method achieves state-of-the-art performance in both partial fingerprint verification and relative pose estimation.
arXiv Detail & Related papers (2024-05-07T02:45:50Z) - Fixed-length Dense Descriptor for Efficient Fingerprint Matching [33.808749518785]
We propose a three-dimensional representation called Fixed-length Dense Descriptor (FDD) for efficient fingerprint matching.
FDD features great spatial properties, enabling it to capture the spatial relationships of the original fingerprints.
Our experiments on various fingerprint datasets reveal that FDD outperforms other fixed-length descriptors.
arXiv Detail & Related papers (2023-11-30T14:15:39Z) - Benchmarking fixed-length Fingerprint Representations across different
Embedding Sizes and Sensor Types [13.715060479044167]
Deep neural networks have been proposed to extract fixed-length embeddings from fingerprints.
We study the impact in terms of recognition performance of the fingerprint textural information for two sensor types.
arXiv Detail & Related papers (2023-07-17T16:30:44Z) - Revisiting Multimodal Representation in Contrastive Learning: From Patch
and Token Embeddings to Finite Discrete Tokens [76.40196364163663]
We propose a learning-based vision-language pre-training approach, such as CLIP.
We show that our method can learn more comprehensive representations and capture meaningful cross-modal correspondence.
arXiv Detail & Related papers (2023-03-27T00:58:39Z) - Self-Sufficient Framework for Continuous Sign Language Recognition [75.60327502570242]
The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition.
These include the need for complex multi-scale features such as hands, face, and mouth for understanding, and absence of frame-level annotations.
We propose Divide and Focus Convolution (DFConv) which extracts both manual and non-manual features without the need for additional networks or annotations.
DPLR propagates non-spiky frame-level pseudo-labels by combining the ground truth gloss sequence labels with the predicted sequence.
arXiv Detail & Related papers (2023-03-21T11:42:57Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Learning an Ensemble of Deep Fingerprint Representations [40.90173373640335]
Deep neural networks (DNNs) have shown incredible promise in learning fixed-length representations from fingerprints.
There is no universal representation that comprehensively encapsulates all the discriminatory information available in a fingerprint.
We train multiple instances of DeepPrint on different transformations of the input image to generate an ensemble of fingerprint embeddings.
We also propose a feature fusion technique that distills these multiple representations into a single embedding.
arXiv Detail & Related papers (2022-09-02T15:08:33Z) - Pair-Relationship Modeling for Latent Fingerprint Recognition [25.435974669629374]
We propose a new scheme that can model the pair-relationship of two fingerprints directly as the similarity feature for recognition.
Experimental results on two databases show that the proposed method outperforms the state of the art.
arXiv Detail & Related papers (2022-07-02T11:31:31Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - Latent Fingerprint Registration via Matching Densely Sampled Points [100.53031290339483]
Existing latent fingerprint registration approaches are mainly based on establishing correspondences between minutiae.
We propose a non-minutia latent fingerprint registration method which estimates the spatial transformation between a pair of fingerprints.
The proposed method achieves the state-of-the-art registration performance, especially under challenging conditions.
arXiv Detail & Related papers (2020-05-12T15:51:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.