PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
- URL: http://arxiv.org/abs/2107.13967v1
- Date: Thu, 29 Jul 2021 13:57:45 GMT
- Title: PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
- Authors: Yu Fu, TianYang Xu, XiaoJun Wu, Josef Kittler
- Abstract summary: We propose a Patch PyramidTransformer(PPT) to address the issues of extracting semantic information from an image.
The experimental results demonstrate its superior performance against the state-of-the-art fusion approaches.
- Score: 37.993611194758195
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The Transformer architecture has achieved rapiddevelopment in recent years,
outperforming the CNN archi-tectures in many computer vision tasks, such as the
VisionTransformers (ViT) for image classification. However, existingvisual
transformer models aim to extract semantic informationfor high-level tasks such
as classification and detection, distortingthe spatial resolution of the input
image, thus sacrificing thecapacity in reconstructing the input or generating
high-resolutionimages. In this paper, therefore, we propose a Patch
PyramidTransformer(PPT) to effectively address the above issues. Specif-ically,
we first design a Patch Transformer to transform theimage into a sequence of
patches, where transformer encodingis performed for each patch to extract local
representations.In addition, we construct a Pyramid Transformer to
effectivelyextract the non-local information from the entire image.
Afterobtaining a set of multi-scale, multi-dimensional, and multi-anglefeatures
of the original image, we design the image reconstructionnetwork to ensure that
the features can be reconstructed intothe original input. To validate the
effectiveness, we apply theproposed Patch Pyramid Transformer to the image
fusion task.The experimental results demonstrate its superior
performanceagainst the state-of-the-art fusion approaches, achieving the
bestresults on several evaluation indicators. The underlying capacityof the PPT
network is reflected by its universal power in featureextraction and image
reconstruction, which can be directlyapplied to different image fusion tasks
without redesigning orretraining the network.
Related papers
- SwinStyleformer is a favorable choice for image inversion [2.8115030277940947]
This paper proposes the first pure Transformer structure inversion network called SwinStyleformer.
Experiments found that the inversion network with the Transformer backbone could not successfully invert the image.
arXiv Detail & Related papers (2024-06-19T02:08:45Z) - A Contrastive Learning Scheme with Transformer Innate Patches [4.588028371034407]
We present Contrastive Transformer, a contrastive learning scheme using the Transformer innate patches.
The scheme performs supervised patch-level contrastive learning, selecting the patches based on the ground truth mask.
The scheme applies to all vision-transformer architectures, is easy to implement, and introduces minimal additional memory footprint.
arXiv Detail & Related papers (2023-03-26T20:19:28Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition.
We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution.
Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z) - PanFormer: a Transformer Based Model for Pan-sharpening [49.45405879193866]
Pan-sharpening aims at producing a high-resolution (HR) multi-spectral (MS) image from a low-resolution (LR) multi-spectral (MS) image and its corresponding panchromatic (PAN) image acquired by a same satellite.
Inspired by a new fashion in recent deep learning community, we propose a novel Transformer based model for pan-sharpening.
arXiv Detail & Related papers (2022-03-06T09:22:20Z) - Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy
for Image Recognition without Convolutions [1.1032962642000486]
This work is based on Vision Transformer, combined with the pyramid architecture, using Split-merge-transform to propose the group encoder and name the network architecture Aggregated Pyramid Vision Transformer (APVT)
We perform image classification tasks on the CIFAR-10 dataset and object detection tasks on the COCO 2017 dataset.
arXiv Detail & Related papers (2022-03-02T09:14:28Z) - Towards End-to-End Image Compression and Analysis with Transformers [99.50111380056043]
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
arXiv Detail & Related papers (2021-12-17T03:28:14Z) - Uformer: A General U-Shaped Transformer for Image Restoration [47.60420806106756]
We build a hierarchical encoder-decoder network using the Transformer block for image restoration.
Experiments on several image restoration tasks demonstrate the superiority of Uformer.
arXiv Detail & Related papers (2021-06-06T12:33:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.