LGFCTR: Local and Global Feature Convolutional Transformer for Image
Matching
- URL: http://arxiv.org/abs/2311.17571v1
- Date: Wed, 29 Nov 2023 12:06:19 GMT
- Title: LGFCTR: Local and Global Feature Convolutional Transformer for Image
Matching
- Authors: Wenhao Zhong and Jie Jiang
- Abstract summary: A novel convolutional transformer is proposed to capture both local contexts and global structures.
A universal FPN-like framework captures global structures in self-encoder as well as cross-decoder by transformers.
A novel regression-based sub-pixel refinement module exploits the whole fine-grained window features for fine-level positional deviation regression.
- Score: 8.503217766507584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image matching that finding robust and accurate correspondences across images
is a challenging task under extreme conditions. Capturing local and global
features simultaneously is an important way to mitigate such an issue but
recent transformer-based decoders were still stuck in the issues that CNN-based
encoders only extract local features and the transformers lack locality.
Inspired by the locality and implicit positional encoding of convolutions, a
novel convolutional transformer is proposed to capture both local contexts and
global structures more sufficiently for detector-free matching. Firstly, a
universal FPN-like framework captures global structures in self-encoder as well
as cross-decoder by transformers and compensates local contexts as well as
implicit positional encoding by convolutions. Secondly, a novel convolutional
transformer module explores multi-scale long range dependencies by a novel
multi-scale attention and further aggregates local information inside
dependencies for enhancing locality. Finally, a novel regression-based
sub-pixel refinement module exploits the whole fine-grained window features for
fine-level positional deviation regression. The proposed method achieves
superior performances on a wide range of benchmarks. The code will be available
on https://github.com/zwh0527/LGFCTR.
Related papers
- Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding.
It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z) - EDTER: Edge Detection with Transformer [71.83960813880843]
We propose a novel transformer-based edge detector, emphEdge Detection TransformER (EDTER), to extract clear and crisp object boundaries and meaningful edges.
EDTER exploits the full image context information and detailed local cues simultaneously.
Experiments on BSDS500, NYUDv2, and Multicue demonstrate the superiority of EDTER in comparison with state-of-the-arts.
arXiv Detail & Related papers (2022-03-16T11:55:55Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - Global and Local Alignment Networks for Unpaired Image-to-Image
Translation [170.08142745705575]
The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style.
Due to the lack of attention to the content change in existing methods, semantic information from source images suffers from degradation during translation.
We introduce a novel approach, Global and Local Alignment Networks (GLA-Net)
Our method effectively generates sharper and more realistic images than existing approaches.
arXiv Detail & Related papers (2021-11-19T18:01:54Z) - Boosting Salient Object Detection with Transformer-based Asymmetric
Bilateral U-Net [19.21709807149165]
Existing salient object detection (SOD) methods mainly rely on U-shaped convolution neural networks (CNNs) with skip connections.
We propose a transformer-based Asymmetric Bilateral U-Net (ABiU-Net) to learn both global and local representations for SOD.
ABiU-Net performs favorably against previous state-of-the-art SOD methods.
arXiv Detail & Related papers (2021-08-17T19:45:28Z) - Unifying Global-Local Representations in Salient Object Detection with Transformer [55.23033277636774]
We introduce a new attention-based encoder, vision transformer, into salient object detection.
With the global view in very shallow layers, the transformer encoder preserves more local representations.
Our method significantly outperforms other FCN-based and transformer-based methods in five benchmarks.
arXiv Detail & Related papers (2021-08-05T17:51:32Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - LocalViT: Bringing Locality to Vision Transformers [132.42018183859483]
locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects.
We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network.
This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks.
arXiv Detail & Related papers (2021-04-12T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.