Related papers: LoFTR: Detector-Free Local Feature Matching with Transformers

LoFTR: Detector-Free Local Feature Matching with Transformers

URL: http://arxiv.org/abs/2104.00680v1
Date: Thu, 1 Apr 2021 17:59:42 GMT
Title: LoFTR: Detector-Free Local Feature Matching with Transformers
Authors: Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, Xiaowei Zhou
Abstract summary: Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level. In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin.
Score: 40.754990768677295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a novel method for local image feature matching. Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the good matches at a fine level. In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images. The global receptive field provided by Transformer enables our method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin. LoFTR also ranks first on two public benchmarks of visual localization among the published methods.

Related papers

Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting [5.731406209558667]
STDLoc is a full relocalization pipeline that can achieve accurate relocalization without relying on any pose prior. STDLoc outperforms current state-of-the-art localization methods in terms of localization accuracy and recall.
arXiv Detail & Related papers (2025-03-25T05:18:19Z)
Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP. VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone. Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z)
Improving Transformer-based Image Matching by Cascaded Capturing Spatially Informative Keypoints [44.90917854990362]
We propose a transformer-based cascade matching model -- Cascade feature Matching TRansformer (CasMTR) We use a simple yet effective Non-Maximum Suppression (NMS) post-process to filter keypoints through the confidence map. CasMTR achieves state-of-the-art performance in indoor and outdoor pose estimation as well as visual localization.
arXiv Detail & Related papers (2023-03-06T04:32:34Z)
SuperGF: Unifying Local and Global Features for Visual Localization [13.869227429939423]
SuperGF is a transformer-based aggregation model that operates directly on image-matching-specific local features. We provide implementations of SuperGF using various types of local features, including dense and sparse learning-based or hand-crafted descriptors.
arXiv Detail & Related papers (2022-12-23T13:48:07Z)
TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization [17.270110456445806]
TruFor is a forensic framework that can be applied to a large variety of image manipulation methods. We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture. Our method is able to reliably detect and localize both cheapfakes and deepfakes manipulations outperforming state-of-the-art works.
arXiv Detail & Related papers (2022-12-21T11:49:43Z)
Towards Effective Image Manipulation Detection with Proposal Contrastive Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection. Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively. Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z)
ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations. We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z)
Guide Local Feature Matching by Overlap Estimation [9.387323456222823]
We introduce a novel Overlap Estimation method conditioned on image pairs with TRansformer, named OETR. OETR performs overlap estimation in a two-step process of feature correlation and then overlap regression. Experiments show that OETR can boost state-of-the-art local feature matching performance substantially.
arXiv Detail & Related papers (2022-02-18T07:11:36Z)
TBNet:Two-Stream Boundary-aware Network for Generic Image Manipulation Localization [49.521622399483846]
We propose a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) for generic image manipulation localization. The proposed TBNet can significantly outperform state-of-the-art generic image manipulation localization methods in terms of both MCC and F1.
arXiv Detail & Related papers (2021-08-10T08:22:05Z)
GLiT: Neural Architecture Search for Global and Local Image Transformer [114.8051035856023]
We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition. Our method can find more discriminative and efficient transformer variants than the ResNet family and the baseline ViT for image classification.
arXiv Detail & Related papers (2021-07-07T00:48:09Z)
Instance-level Image Retrieval using Reranking Transformers [18.304597755595697]
Instance-level image retrieval is the task of searching in a large database for images that match an object in a query image. We propose Reranking Transformers (RRTs) as a general model to incorporate both local and global features to rerank the matching images. RRTs are lightweight and can be easily parallelized so that reranking a set of top matching results can be performed in a single forward-pass.
arXiv Detail & Related papers (2021-03-22T23:58:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.