AbHE: All Attention-based Homography Estimation
- URL: http://arxiv.org/abs/2212.03029v2
- Date: Wed, 7 Dec 2022 02:04:41 GMT
- Title: AbHE: All Attention-based Homography Estimation
- Authors: Mingxiao Huo, Zhihao Zhang, Xianqiang Yang
- Abstract summary: We propose a strong-baseline model based on the Swin Transformer, which combines convolution neural network for local features and transformer module for global features.
In the homography regression stage, we adopt an attention layer for the channels of correlation volume, which can drop out some weak correlation feature points.
The experiment shows that in 8 Degree-of-Freedoms(DOFs) homography estimation our method overperforms the state-of-the-art method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Homography estimation is a basic computer vision task, which aims to obtain
the transformation from multi-view images for image alignment. Unsupervised
learning homography estimation trains a convolution neural network for feature
extraction and transformation matrix regression. While the state-of-theart
homography method is based on convolution neural networks, few work focuses on
transformer which shows superiority in highlevel vision tasks. In this paper,
we propose a strong-baseline model based on the Swin Transformer, which
combines convolution neural network for local features and transformer module
for global features. Moreover, a cross non-local layer is introduced to search
the matched features within the feature maps coarsely. In the homography
regression stage, we adopt an attention layer for the channels of correlation
volume, which can drop out some weak correlation feature points. The experiment
shows that in 8 Degree-of-Freedoms(DOFs) homography estimation our method
overperforms the state-of-the-art method.
Related papers
- Progressive Retinal Image Registration via Global and Local Deformable Transformations [49.032894312826244]
We propose a hybrid registration framework called HybridRetina.
We use a keypoint detector and a deformation network called GAMorph to estimate the global transformation and local deformable transformation.
Experiments on two widely-used datasets, FIRE and FLoRI21, show that our proposed HybridRetina significantly outperforms some state-of-the-art methods.
arXiv Detail & Related papers (2024-09-02T08:43:50Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Towards Hierarchical Regional Transformer-based Multiple Instance
Learning [2.16656895298847]
We propose a Transformer-based multiple instance learning approach that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism.
We present a method that fuses regional patch information to derive slide-level predictions and show how this regional aggregation can be stacked to hierarchically process features on different distance levels.
Our approach is able to significantly improve performance over the baseline on two histopathology datasets and points towards promising directions for further research.
arXiv Detail & Related papers (2023-08-24T08:19:15Z) - Unsupervised Domain Transfer with Conditional Invertible Neural Networks [83.90291882730925]
We propose a domain transfer approach based on conditional invertible neural networks (cINNs)
Our method inherently guarantees cycle consistency through its invertible architecture, and network training can efficiently be conducted with maximum likelihood.
Our method enables the generation of realistic spectral data and outperforms the state of the art on two downstream classification tasks.
arXiv Detail & Related papers (2023-03-17T18:00:27Z) - Learning Local Implicit Fourier Representation for Image Warping [11.526109213908091]
We propose a local texture estimator for image warping (LTEW) followed by an implicit neural representation to deform images into continuous shapes.
Our LTEW-based neural function outperforms existing warping methods for asymmetric-scale SR and homography transform.
arXiv Detail & Related papers (2022-07-05T06:30:17Z) - Weakly-supervised fire segmentation by visualizing intermediate CNN
layers [82.75113406937194]
Fire localization in images and videos is an important step for an autonomous system to combat fire incidents.
We consider weakly supervised segmentation of fire in images, in which only image labels are used to train the network.
We show that in the case of fire segmentation, which is a binary segmentation problem, the mean value of features in a mid-layer of classification CNN can perform better than conventional Class Activation Mapping (CAM) method.
arXiv Detail & Related papers (2021-11-16T11:56:28Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z) - Graph Neural Networks for UnsupervisedDomain Adaptation of
Histopathological ImageAnalytics [22.04114134677181]
We present a novel method for the unsupervised domain adaptation for histological image analysis.
It is based on a backbone for embedding images into a feature space, and a graph neural layer for propa-gating the supervision signals of images with labels.
In experiments, our methodachieves state-of-the-art performance on four public datasets.
arXiv Detail & Related papers (2020-08-21T04:53:44Z) - Vanishing Point Detection with Direct and Transposed Fast Hough
Transform inside the neural network [0.0]
In this paper, we suggest a new neural network architecture for vanishing point detection in images.
The key element is the use of the direct and transposed Fast Hough Transforms separated by convolutional layer blocks with standard activation functions.
arXiv Detail & Related papers (2020-02-04T09:10:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.