LEADER: Lightweight End-to-End Attention-Gated Dual Autoencoder for Robust Minutiae Extraction
- URL: http://arxiv.org/abs/2602.15493v1
- Date: Tue, 17 Feb 2026 11:02:28 GMT
- Title: LEADER: Lightweight End-to-End Attention-Gated Dual Autoencoder for Robust Minutiae Extraction
- Authors: Raffaele Cappelli, Matteo Ferrara,
- Abstract summary: This paper introduces LEADER (Lightweight End-to-end Attention-gated Dual autoencodER), a neural network that maps raw fingerprint images to minutiae descriptors.<n>It employs a novel "Castle-Moat-Rampart" ground-truth encoding and a dual-autoencoder structure, interconnected through an attention-gating mechanism.<n>It attains a 34% higher F1-score on the NIST SD27 dataset compared to specialized latent minutiae extractors.
- Score: 0.05978532290288763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Minutiae extraction, a fundamental stage in fingerprint recognition, is increasingly shifting toward deep learning. However, truly end-to-end methods that eliminate separate preprocessing and postprocessing steps remain scarce. This paper introduces LEADER (Lightweight End-to-end Attention-gated Dual autoencodER), a neural network that maps raw fingerprint images to minutiae descriptors, including location, direction, and type. The proposed architecture integrates non-maximum suppression and angular decoding to enable complete end-to-end inference using only 0.9M parameters. It employs a novel "Castle-Moat-Rampart" ground-truth encoding and a dual-autoencoder structure, interconnected through an attention-gating mechanism. Experimental evaluations demonstrate state-of-the-art accuracy on plain fingerprints and robust cross-domain generalization to latent impressions. Specifically, LEADER attains a 34% higher F1-score on the NIST SD27 dataset compared to specialized latent minutiae extractors. Sample-level analysis on this challenging benchmark reveals an average rank of 2.07 among all compared methods, with LEADER securing the first-place position in 47% of the samples-more than doubling the frequency of the second-best extractor. The internal representations learned by the model align with established fingerprint domain features, such as segmentation masks, orientation fields, frequency maps, and skeletons. Inference requires 15ms on GPU and 322ms on CPU, outperforming leading commercial software in computational efficiency. The source code and pre-trained weights are publicly released to facilitate reproducibility.
Related papers
- Fast SAM2 with Text-Driven Token Pruning [52.8350457627401]
Segment Anything Model 2 (SAM2), a vision computation model has significantly advanced in prompt-driven video object segmentation.<n>SAM2 pipelines propagate all visual tokens produced by the image encoder through downstream temporal reasoning modules, regardless of their relevance to the target object.<n>We introduce a text-guided token pruning framework that improves inference efficiency by selectively reducing token density prior to temporal propagation.
arXiv Detail & Related papers (2025-12-24T18:59:05Z) - AHDMIL: Asymmetric Hierarchical Distillation Multi-Instance Learning for Fast and Accurate Whole-Slide Image Classification [51.525891360380285]
AHDMIL is an Asymmetric Hierarchical Distillation Multi-Instance Learning framework.<n>It eliminates irrelevant patches through a two-step training process.<n>It consistently outperforms previous state-of-the-art methods in both classification performance and inference speed.
arXiv Detail & Related papers (2025-08-07T07:47:16Z) - HandReader: Advanced Techniques for Efficient Fingerspelling Recognition [75.38606213726906]
This paper introduces HandReader, a group of three architectures designed to address the fingerspelling recognition task.<n>HandReader$_RGB$ employs the novel Adaptive Shift-Temporal Module (TSAM) to process RGB features from videos of varying lengths.<n>HandReader$_KP$ is built on the proposed Temporal Pose (TPE) operated on keypoints as tensors.<n>Each HandReader model possesses distinct advantages and achieves state-of-the-art results on the ChicagoFSWild and ChicagoFSWild+ datasets.
arXiv Detail & Related papers (2025-05-15T13:18:37Z) - Finger Pose Estimation for Under-screen Fingerprint Sensor [33.05877729161858]
We present a novel dual-modal input based network for under-screen fingerprint pose estimation.<n>Our approach effectively integrates texture details extracted from ridge patches through the under-screen fingerprint sensor, and rough contours derived from capacitive images obtained via the touch screen.<n>Our method is significantly superior to previous state-of-the-art (SOTA) methods and remarkably boosts the recognition ability of fingerprint recognition algorithms.
arXiv Detail & Related papers (2025-05-05T09:05:47Z) - HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis [21.25786478579275]
Handwritten document recognition is one of the most challenging tasks in computer vision.<n>Traditionally, this problem has been approached as two separate tasks, handwritten text recognition and layout analysis.<n>This paper introduces HAND, a novel end-to-end and segmentation-free architecture for simultaneous text recognition and layout analysis tasks.
arXiv Detail & Related papers (2024-12-25T20:36:29Z) - SAM-based instance segmentation models for the automation of structural
damage detection [0.0]
We present a data set for instance segmentation with 1,300 annotated images (640 pixels x 640 pixels), named as M1300, covering bricks, broken bricks, and cracks.
We test several leading algorithms for benchmarking, including the latest large-scale model, the prompt-based Segment Anything Model (SAM)
We propose two novel methods for automation of SAM execution. The first method involves abandoning the prompt encoder and connecting the SAM encoder to other decoders, while the second method introduces a learnable self-generating prompter.
arXiv Detail & Related papers (2024-01-27T02:00:07Z) - Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for
Mobile Robots [17.90723909170376]
We introduce Mobile-Seed, a lightweight framework for simultaneous semantic segmentation and boundary detection.
Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach.
Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline.
arXiv Detail & Related papers (2023-11-21T14:53:02Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection [48.66703222700795]
We resort to a novel kernel strategy to identify the most informative point clouds to acquire labels.
To accommodate both one-stage (i.e., SECOND) and two-stage detectors, we incorporate the classification entropy tangent and well trade-off between detection performance and the total number of bounding boxes selected for annotation.
Our results show that approximately 44% box-level annotation costs and 26% computational time are reduced compared to the state-of-the-art method.
arXiv Detail & Related papers (2023-07-16T04:27:03Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.