PanFormer: a Transformer Based Model for Pan-sharpening
- URL: http://arxiv.org/abs/2203.02916v1
- Date: Sun, 6 Mar 2022 09:22:20 GMT
- Title: PanFormer: a Transformer Based Model for Pan-sharpening
- Authors: Huanyu Zhou, Qingjie Liu, Yunhong Wang
- Abstract summary: Pan-sharpening aims at producing a high-resolution (HR) multi-spectral (MS) image from a low-resolution (LR) multi-spectral (MS) image and its corresponding panchromatic (PAN) image acquired by a same satellite.
Inspired by a new fashion in recent deep learning community, we propose a novel Transformer based model for pan-sharpening.
- Score: 49.45405879193866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pan-sharpening aims at producing a high-resolution (HR) multi-spectral (MS)
image from a low-resolution (LR) multi-spectral (MS) image and its
corresponding panchromatic (PAN) image acquired by a same satellite. Inspired
by a new fashion in recent deep learning community, we propose a novel
Transformer based model for pan-sharpening. We explore the potential of
Transformer in image feature extraction and fusion. Following the successful
development of vision transformers, we design a two-stream network with the
self-attention to extract the modality-specific features from the PAN and MS
modalities and apply a cross-attention module to merge the spectral and spatial
features. The pan-sharpened image is produced from the enhanced fused features.
Extensive experiments on GaoFen-2 and WorldView-3 images demonstrate that our
Transformer based model achieves impressive results and outperforms many
existing CNN based methods, which shows the great potential of introducing
Transformer to the pan-sharpening task. Codes are available at
https://github.com/zhysora/PanFormer.
Related papers
- SwinStyleformer is a favorable choice for image inversion [2.8115030277940947]
This paper proposes the first pure Transformer structure inversion network called SwinStyleformer.
Experiments found that the inversion network with the Transformer backbone could not successfully invert the image.
arXiv Detail & Related papers (2024-06-19T02:08:45Z) - Multimodal Token Fusion for Vision Transformers [54.81107795090239]
We propose a multimodal token fusion method (TokenFusion) for transformer-based vision tasks.
To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitutes these tokens with projected and aggregated inter-modal features.
The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact.
arXiv Detail & Related papers (2022-04-19T07:47:50Z) - HyperTransformer: A Textural and Spectral Feature Fusion Transformer for
Pansharpening [60.89777029184023]
Pansharpening aims to fuse a registered high-resolution panchromatic image (PAN) with a low-resolution hyperspectral image (LR-HSI) to generate an enhanced HSI with high spectral and spatial resolution.
Existing pansharpening approaches neglect using an attention mechanism to transfer HR texture features from PAN to LR-HSI features, resulting in spatial and spectral distortions.
We present a novel attention mechanism for pansharpening called HyperTransformer, in which features of LR-HSI and PAN are formulated as queries and keys in a transformer, respectively.
arXiv Detail & Related papers (2022-03-04T18:59:08Z) - PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion [37.993611194758195]
We propose a Patch PyramidTransformer(PPT) to address the issues of extracting semantic information from an image.
The experimental results demonstrate its superior performance against the state-of-the-art fusion approaches.
arXiv Detail & Related papers (2021-07-29T13:57:45Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z) - ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.