LoFTR: Detector-Free Local Feature Matching with Transformers
- URL: http://arxiv.org/abs/2104.00680v1
- Date: Thu, 1 Apr 2021 17:59:42 GMT
- Title: LoFTR: Detector-Free Local Feature Matching with Transformers
- Authors: Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, Xiaowei Zhou
- Abstract summary: Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level.
In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images.
The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin.
- Score: 40.754990768677295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel method for local image feature matching. Instead of
performing image feature detection, description, and matching sequentially, we
propose to first establish pixel-wise dense matches at a coarse level and later
refine the good matches at a fine level. In contrast to dense methods that use
a cost volume to search correspondences, we use self and cross attention layers
in Transformer to obtain feature descriptors that are conditioned on both
images. The global receptive field provided by Transformer enables our method
to produce dense matches in low-texture areas, where feature detectors usually
struggle to produce repeatable interest points. The experiments on indoor and
outdoor datasets show that LoFTR outperforms state-of-the-art methods by a
large margin. LoFTR also ranks first on two public benchmarks of visual
localization among the published methods.
Related papers
- Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - Improving Transformer-based Image Matching by Cascaded Capturing
Spatially Informative Keypoints [44.90917854990362]
We propose a transformer-based cascade matching model -- Cascade feature Matching TRansformer (CasMTR)
We use a simple yet effective Non-Maximum Suppression (NMS) post-process to filter keypoints through the confidence map.
CasMTR achieves state-of-the-art performance in indoor and outdoor pose estimation as well as visual localization.
arXiv Detail & Related papers (2023-03-06T04:32:34Z) - SuperGF: Unifying Local and Global Features for Visual Localization [13.869227429939423]
SuperGF is a transformer-based aggregation model that operates directly on image-matching-specific local features.
We provide implementations of SuperGF using various types of local features, including dense and sparse learning-based or hand-crafted descriptors.
arXiv Detail & Related papers (2022-12-23T13:48:07Z) - TruFor: Leveraging all-round clues for trustworthy image forgery
detection and localization [17.270110456445806]
TruFor is a forensic framework that can be applied to a large variety of image manipulation methods.
We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture.
Our method is able to reliably detect and localize both cheapfakes and deepfakes manipulations outperforming state-of-the-art works.
arXiv Detail & Related papers (2022-12-21T11:49:43Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - Guide Local Feature Matching by Overlap Estimation [9.387323456222823]
We introduce a novel Overlap Estimation method conditioned on image pairs with TRansformer, named OETR.
OETR performs overlap estimation in a two-step process of feature correlation and then overlap regression.
Experiments show that OETR can boost state-of-the-art local feature matching performance substantially.
arXiv Detail & Related papers (2022-02-18T07:11:36Z) - TBNet:Two-Stream Boundary-aware Network for Generic Image Manipulation
Localization [49.521622399483846]
We propose a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) for generic image manipulation localization.
The proposed TBNet can significantly outperform state-of-the-art generic image manipulation localization methods in terms of both MCC and F1.
arXiv Detail & Related papers (2021-08-10T08:22:05Z) - GLiT: Neural Architecture Search for Global and Local Image Transformer [114.8051035856023]
We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition.
Our method can find more discriminative and efficient transformer variants than the ResNet family and the baseline ViT for image classification.
arXiv Detail & Related papers (2021-07-07T00:48:09Z) - Instance-level Image Retrieval using Reranking Transformers [18.304597755595697]
Instance-level image retrieval is the task of searching in a large database for images that match an object in a query image.
We propose Reranking Transformers (RRTs) as a general model to incorporate both local and global features to rerank the matching images.
RRTs are lightweight and can be easily parallelized so that reranking a set of top matching results can be performed in a single forward-pass.
arXiv Detail & Related papers (2021-03-22T23:58:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.