A Streamlined Attention-Based Network for Descriptor Extraction
- URL: http://arxiv.org/abs/2601.13126v1
- Date: Mon, 19 Jan 2026 15:12:42 GMT
- Title: A Streamlined Attention-Based Network for Descriptor Extraction
- Authors: Mattia D'Urso, Emanuele Santellani, Christian Sormann, Mattia Rossi, Andreas Kuhn, Friedrich Fraundorfer,
- Abstract summary: We introduce SANDesc, a Streamlined Attention-Based Network for Descriptor extraction.<n>We employ a revised U-Net-like architecture enhanced with Convolutional Block Attention Modules and residual paths.<n>Sandesc achieves substantial performance gains over the existing descriptors while operating with limited computational resources.
- Score: 12.162883531502173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SANDesc, a Streamlined Attention-Based Network for Descriptor extraction that aims to improve on existing architectures for keypoint description. Our descriptor network learns to compute descriptors that improve matching without modifying the underlying keypoint detector. We employ a revised U-Net-like architecture enhanced with Convolutional Block Attention Modules and residual paths, enabling effective local representation while maintaining computational efficiency. We refer to the building blocks of our model as Residual U-Net Blocks with Attention. The model is trained using a modified triplet loss in combination with a curriculum learning-inspired hard negative mining strategy, which improves training stability. Extensive experiments on HPatches, MegaDepth-1500, and the Image Matching Challenge 2021 show that training SANDesc on top of existing keypoint detectors leads to improved results on multiple matching tasks compared to the original keypoint descriptors. At the same time, SANDesc has a model complexity of just 2.4 million parameters. As a further contribution, we introduce a new urban dataset featuring 4K images and pre-calibrated intrinsics, designed to evaluate feature extractors. On this benchmark, SANDesc achieves substantial performance gains over the existing descriptors while operating with limited computational resources.
Related papers
- Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z) - DSCformer: A Dual-Branch Network Integrating Enhanced Dynamic Snake Convolution and SegFormer for Crack Segmentation [6.898227391740093]
Current convolutional neural networks (CNNs) have demonstrated strong performance in crack segmentation tasks.
Transformers excel at capturing global context but lack precision in detailed feature extraction.
We introduce DSCformer, a novel hybrid model that integrates an enhanced Dynamic Snake Convolution (DSConv) with a Transformer architecture for crack segmentation.
arXiv Detail & Related papers (2024-11-14T11:25:32Z) - Single image super-resolution based on trainable feature matching attention network [0.0]
Convolutional Neural Networks (CNNs) have been widely employed for image Super-Resolution (SR)
We introduce Trainable Feature Matching (TFM) to amalgamate explicit feature learning into CNNs, augmenting their representation capabilities.
We also propose a streamlined variant called Same-size-divided Region-level Non-Local (SRNL) to alleviate the computational demands of non-local operations.
arXiv Detail & Related papers (2024-05-29T08:31:54Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Binarized Spectral Compressive Imaging [59.18636040850608]
Existing deep learning models for hyperspectral image (HSI) reconstruction achieve good performance but require powerful hardwares with enormous memory and computational resources.
We propose a novel method, Binarized Spectral-Redistribution Network (BiSRNet)
BiSRNet is derived by using the proposed techniques to binarize the base model.
arXiv Detail & Related papers (2023-05-17T15:36:08Z) - ZippyPoint: Fast Interest Point Detection, Description, and Matching
through Mixed Precision Discretization [71.91942002659795]
We investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms.
ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size.
These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization.
arXiv Detail & Related papers (2022-03-07T18:59:03Z) - Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks.
We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z) - Improving accuracy and speeding up Document Image Classification through
parallel systems [4.102028235659611]
We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model.
We present an ensemble pipeline which is able to boost solely image input.
Lastly, we expose the training performance differences between PyTorch and Deep Learning frameworks.
arXiv Detail & Related papers (2020-06-16T13:36:07Z) - ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.