PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution
- URL: http://arxiv.org/abs/2303.13807v1
- Date: Fri, 24 Mar 2023 05:04:52 GMT
- Title: PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution
- Authors: Hansheng Guo, Juncheng Li, Guangwei Gao, Zhi Li, Tieyong Zeng
- Abstract summary: We propose a novel Transformer-based parallax fusion module called Parallax Fusion Transformer (PFT)
PFT employs a Cross-view Fusion Transformer (CVFT) to utilize cross-view information and an Intra-view Refinement Transformer (IVRT) for intra-view feature refinement.
Experiments and ablation studies show that PFT-SSR achieves competitive results and outperforms most SOTA methods.
- Score: 22.251884516076096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stereo image super-resolution aims to boost the performance of image
super-resolution by exploiting the supplementary information provided by
binocular systems. Although previous methods have achieved promising results,
they did not fully utilize the information of cross-view and intra-view. To
further unleash the potential of binocular images, in this letter, we propose a
novel Transformerbased parallax fusion module called Parallax Fusion
Transformer (PFT). PFT employs a Cross-view Fusion Transformer (CVFT) to
utilize cross-view information and an Intra-view Refinement Transformer (IVRT)
for intra-view feature refinement. Meanwhile, we adopted the Swin Transformer
as the backbone for feature extraction and SR reconstruction to form a pure
Transformer architecture called PFT-SSR. Extensive experiments and ablation
studies show that PFT-SSR achieves competitive results and outperforms most
SOTA methods. Source code is available at https://github.com/MIVRC/PFT-PyTorch.
Related papers
- Self-Supervised Pre-Training for Table Structure Recognition Transformer [25.04573593082671]
We propose a self-supervised pre-training (SSP) method for table structure recognition transformers.
We discover that the performance gap between the linear projection transformer and the hybrid CNN-transformer can be mitigated by SSP of the visual encoder in the TSR model.
arXiv Detail & Related papers (2024-02-23T19:34:06Z) - ITSRN++: Stronger and Better Implicit Transformer Network for Continuous
Screen Content Image Super-Resolution [32.441761727608856]
The proposed method achieves state-of-the-art performance for SCI SR (outperforming SwinIR by 0.74 dB for x3 SR) and also works well for natural image SR.
We construct a large scale SCI2K dataset to facilitate the research on SCI SR.
arXiv Detail & Related papers (2022-10-17T07:47:34Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Towards Lightweight Transformer via Group-wise Transformation for
Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer.
We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.
Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z) - PanFormer: a Transformer Based Model for Pan-sharpening [49.45405879193866]
Pan-sharpening aims at producing a high-resolution (HR) multi-spectral (MS) image from a low-resolution (LR) multi-spectral (MS) image and its corresponding panchromatic (PAN) image acquired by a same satellite.
Inspired by a new fashion in recent deep learning community, we propose a novel Transformer based model for pan-sharpening.
arXiv Detail & Related papers (2022-03-06T09:22:20Z) - Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy
for Image Recognition without Convolutions [1.1032962642000486]
This work is based on Vision Transformer, combined with the pyramid architecture, using Split-merge-transform to propose the group encoder and name the network architecture Aggregated Pyramid Vision Transformer (APVT)
We perform image classification tasks on the CIFAR-10 dataset and object detection tasks on the COCO 2017 dataset.
arXiv Detail & Related papers (2022-03-02T09:14:28Z) - Towards End-to-End Image Compression and Analysis with Transformers [99.50111380056043]
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
arXiv Detail & Related papers (2021-12-17T03:28:14Z) - Vision Transformer with Progressive Sampling [73.60630716500154]
We propose an iterative and progressive sampling strategy to locate discriminative regions.
When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy.
arXiv Detail & Related papers (2021-08-03T18:04:31Z) - PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion [37.993611194758195]
We propose a Patch PyramidTransformer(PPT) to address the issues of extracting semantic information from an image.
The experimental results demonstrate its superior performance against the state-of-the-art fusion approaches.
arXiv Detail & Related papers (2021-07-29T13:57:45Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Transformer for Image Quality Assessment [14.975436239088312]
We propose an architecture of using a shallow Transformer encoder on the top of a feature map extracted by convolution neural networks (CNN)
Adaptive positional embedding is employed in the Transformer encoder to handle images with arbitrary resolutions.
We have found that the proposed TRIQ architecture achieves outstanding performance.
arXiv Detail & Related papers (2020-12-30T18:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.