LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation
- URL: http://arxiv.org/abs/2106.04067v1
- Date: Tue, 8 Jun 2021 02:51:45 GMT
- Title: LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation
- Authors: Ruizhi Shao, Gaochang Wu, Yuemei Zhou, Ying Fu, Lu Fang, Yebin Liu
- Abstract summary: Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
- Score: 52.63874513999119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-resolution image alignment is a key problem in multiscale gigapixel
photography, which requires to estimate homography matrix using images with
large resolution gap. Existing deep homography methods concatenate the input
images or features, neglecting the explicit formulation of correspondences
between them, which leads to degraded accuracy in cross-resolution challenges.
In this paper, we consider the cross-resolution homography estimation as a
multimodal problem, and propose a local transformer network embedded within a
multiscale structure to explicitly learn correspondences between the multimodal
inputs, namely, input images with different resolutions. The proposed local
transformer adopts a local attention map specifically for each position in the
feature. By combining the local transformer with the multiscale structure, the
network is able to capture long-short range correspondences efficiently and
accurately. Experiments on both the MS-COCO dataset and the real-captured
cross-resolution dataset show that the proposed network outperforms existing
state-of-the-art feature-based and deep-learning-based homography estimation
methods, and is able to accurately align images under $10\times$ resolution
gap.
Related papers
- Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring [25.36888929483233]
We propose a multi-scale network based on single-input and multiple-outputs(SIMO) for motion deblurring.
We combine the characteristics of real-world trajectories with a learnable wavelet transform module to focus on the directional continuity and frequency features of the step-by-step transitions between blurred images to sharp images.
arXiv Detail & Related papers (2023-12-29T02:59:40Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - Continuous Cross-resolution Remote Sensing Image Change Detection [28.466756872079472]
Real-world applications raise the need for cross-resolution change detection, aka, CD based on bitemporal images with different spatial resolutions.
We propose scale-invariant learning to enforce the model consistently predicting HR results given synthesized samples of varying resolution differences.
Our method significantly outperforms several vanilla CD methods and two cross-resolution CD methods on three datasets.
arXiv Detail & Related papers (2023-05-24T04:57:24Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z) - Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image
Super-resolution [9.022005574190182]
We design a network based on the transformer for fusing the low-resolution hyperspectral images and high-resolution multispectral images.
Considering the LR-HSIs hold the main spectral structure, the network focuses on the spatial detail estimation.
Various experiments and quality indexes show our approach's superiority compared with other state-of-the-art methods.
arXiv Detail & Related papers (2021-09-05T14:00:34Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - Multimodal Deep Unfolding for Guided Image Super-Resolution [23.48305854574444]
Deep learning methods rely on training data to learn an end-to-end mapping from a low-resolution input to a high-resolution output.
We propose a multimodal deep learning design that incorporates sparse priors and allows the effective integration of information from another image modality into the network architecture.
Our solution relies on a novel deep unfolding operator, performing steps similar to an iterative algorithm for convolutional sparse coding with side information.
arXiv Detail & Related papers (2020-01-21T14:41:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.