ParaFormer: Parallel Attention Transformer for Efficient Feature
Matching
- URL: http://arxiv.org/abs/2303.00941v1
- Date: Thu, 2 Mar 2023 03:29:16 GMT
- Title: ParaFormer: Parallel Attention Transformer for Efficient Feature
Matching
- Authors: Xiaoyong Lu, Yaping Yan, Bin Kang, Songlin Du
- Abstract summary: This paper proposes a novel parallel attention model entitled ParaFormer.
It fuses features and keypoint positions through the concept of amplitude and phase, and integrates self- and cross-attention in a parallel manner.
Experiments on various applications, including homography estimation, pose estimation, and image matching, demonstrate that ParaFormer achieves state-of-the-art performance.
The efficient ParaFormer-U variant achieves comparable performance with less than 50% FLOPs of the existing attention-based models.
- Score: 8.552303361149612
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Heavy computation is a bottleneck limiting deep-learningbased feature
matching algorithms to be applied in many realtime applications. However,
existing lightweight networks optimized for Euclidean data cannot address
classical feature matching tasks, since sparse keypoint based descriptors are
expected to be matched. This paper tackles this problem and proposes two
concepts: 1) a novel parallel attention model entitled ParaFormer and 2) a
graph based U-Net architecture with attentional pooling. First, ParaFormer
fuses features and keypoint positions through the concept of amplitude and
phase, and integrates self- and cross-attention in a parallel manner which
achieves a win-win performance in terms of accuracy and efficiency. Second,
with U-Net architecture and proposed attentional pooling, the ParaFormer-U
variant significantly reduces computational complexity, and minimize
performance loss caused by downsampling. Sufficient experiments on various
applications, including homography estimation, pose estimation, and image
matching, demonstrate that ParaFormer achieves state-of-the-art performance
while maintaining high efficiency. The efficient ParaFormer-U variant achieves
comparable performance with less than 50% FLOPs of the existing attention-based
models.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks.
We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes.
Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - A Multi-Resolution Framework for U-Nets with Applications to
Hierarchical VAEs [29.995904718691204]
We formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space.
We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs) which have a U-Net architecture.
arXiv Detail & Related papers (2023-01-19T17:33:48Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - EQ-Net: A Unified Deep Learning Framework for Log-Likelihood Ratio
Estimation and Quantization [25.484585922608193]
We introduce EQ-Net: the first holistic framework that solves both the tasks of log-likelihood ratio (LLR) estimation and quantization using a data-driven method.
We carry out extensive experimental evaluation and demonstrate that our single architecture achieves state-of-the-art results on both tasks.
arXiv Detail & Related papers (2020-12-23T18:11:30Z) - EfficientPose: Scalable single-person pose estimation [3.325625311163864]
We propose a novel convolutional neural network architecture, called EfficientPose, for single-person pose estimation.
Our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets.
Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost.
arXiv Detail & Related papers (2020-04-25T16:50:46Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z) - Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low
Latency [23.443265839365054]
Analysis of state-of-the-art VO/VSLAM system exposes a gap in balancing performance (accuracy & robustness) and efficiency (latency)
This paper aims to fill the performance-efficiency gap with an enhancement applied to feature-based VSLAM.
arXiv Detail & Related papers (2020-01-03T03:50:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.