Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like
Speed
- URL: http://arxiv.org/abs/2403.04765v2
- Date: Mon, 11 Mar 2024 23:42:14 GMT
- Title: Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like
Speed
- Authors: Yifan Wang, Xingyi He, Sida Peng, Dongli Tan, Xiaowei Zhou
- Abstract summary: Previous detector-free matcher LoFTR has shown remarkable matching capability in handling large-viewpoint change and texture-poor scenarios.
We revisit its design choices and derive multiple improvements for both efficiency and accuracy.
Our method can achieve higher accuracy compared with competitive semi-dense matchers.
- Score: 42.861344584752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel method for efficiently producing semi-dense matches across
images. Previous detector-free matcher LoFTR has shown remarkable matching
capability in handling large-viewpoint change and texture-poor scenarios but
suffers from low efficiency. We revisit its design choices and derive multiple
improvements for both efficiency and accuracy. One key observation is that
performing the transformer over the entire feature map is redundant due to
shared local information, therefore we propose an aggregated attention
mechanism with adaptive token selection for efficiency. Furthermore, we find
spatial variance exists in LoFTR's fine correlation module, which is adverse to
matching accuracy. A novel two-stage correlation layer is proposed to achieve
accurate subpixel correspondences for accuracy improvement. Our efficiency
optimized model is $\sim 2.5\times$ faster than LoFTR which can even surpass
state-of-the-art efficient sparse matching pipeline SuperPoint + LightGlue.
Moreover, extensive experiments show that our method can achieve higher
accuracy compared with competitive semi-dense matchers, with considerable
efficiency benefits. This opens up exciting prospects for large-scale or
latency-sensitive applications such as image retrieval and 3D reconstruction.
Project page: https://zju3dv.github.io/efficientloftr.
Related papers
- POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search [34.50794776762681]
This paper introduces the Parallel Optimal Position Search (POPoS), a high-precision encoding-decoding framework.
POPoS employs three key innovations: Pseudo-range multilateration is utilized to correct heatmap errors, enhancing the precision of landmark localization.
A single-step parallel algorithm is introduced, significantly enhancing computational efficiency and reducing processing time.
arXiv Detail & Related papers (2024-10-12T16:28:40Z) - LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - Efficient Context Integration through Factorized Pyramidal Learning for
Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner.
We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect.
Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement [80.94378602238432]
We propose an efficient structure named Correspondence Efficient Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner.
To achieve this, multiple transformer blocks are stage-wisely connected to gradually refine the predicted coordinates.
Experiments on various sparse and dense matching tasks demonstrate the superiority of our method in both efficiency and effectiveness against existing state-of-the-arts.
arXiv Detail & Related papers (2022-09-25T13:05:33Z) - Efficient Linear Attention for Fast and Accurate Keypoint Matching [0.9699586426043882]
Recently Transformers have provided state-of-the-art performance in sparse matching, crucial to realize high-performance 3D vision applications.
Yet, these Transformers lack efficiency due to the quadratic computational complexity of their attention mechanism.
We propose a new attentional aggregation that achieves high accuracy by aggregating both the global and local information from sparse keypoints.
arXiv Detail & Related papers (2022-04-16T06:17:36Z) - HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT)
HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach.
Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z) - FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.