CATformer: Contrastive Adversarial Transformer for Image Super-Resolution
- URL: http://arxiv.org/abs/2508.17708v1
- Date: Mon, 25 Aug 2025 06:30:18 GMT
- Title: CATformer: Contrastive Adversarial Transformer for Image Super-Resolution
- Authors: Qinyi Tian, Spence Cox, Laura E. Dalton,
- Abstract summary: Super-resolution remains a promising technique to enhance the quality of low-resolution images.<n>This study introduces CATformer, a novel neural network integrating diffusion-inspired feature refinement with adversarial learning.<n> CATformer outperforms recent transformer-based and diffusion-inspired methods both in efficiency and visual image quality.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Super-resolution remains a promising technique to enhance the quality of low-resolution images. This study introduces CATformer (Contrastive Adversarial Transformer), a novel neural network integrating diffusion-inspired feature refinement with adversarial and contrastive learning. CATformer employs a dual-branch architecture combining a primary diffusion-inspired transformer, which progressively refines latent representations, with an auxiliary transformer branch designed to enhance robustness to noise through learned latent contrasts. These complementary representations are fused and decoded using deep Residual-in-Residual Dense Blocks for enhanced reconstruction quality. Extensive experiments on benchmark datasets demonstrate that CATformer outperforms recent transformer-based and diffusion-inspired methods both in efficiency and visual image quality. This work bridges the performance gap among transformer-, diffusion-, and GAN-based methods, laying a foundation for practical applications of diffusion-inspired transformers in super-resolution.
Related papers
- Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers [55.15722080205737]
Edit2Perceive is a unified diffusion framework that adapts editing models for depth, normal, and matting.<n>Our single-step deterministic inference yields up to faster runtime while training on relatively small datasets.
arXiv Detail & Related papers (2025-11-24T01:13:51Z) - Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection [14.363929799618283]
We propose CHain Of thOught Symbol dEtection (CHOOSE), a CoT-enhanced shallow Transformer framework for wireless symbol detection.<n>By introducing autoregressive latent reasoning steps within the hidden space, CHOOSE significantly improves the reasoning capacity of shallow models.<n> Experimental results demonstrate that our approach outperforms conventional shallow Transformers and achieves performance comparable to that of deep Transformers.
arXiv Detail & Related papers (2025-06-26T08:41:45Z) - TDiR: Transformer based Diffusion for Image Restoration Tasks [19.992144590243836]
Images captured in challenging environments often experience various forms of degradation, including noise, color cast, blur, and light scattering.<n>These effects significantly reduce image quality, hindering their applicability in downstream tasks such as object detection, mapping, and classification.<n>Our transformer-based diffusion model was developed to address image restoration tasks, aiming to improve the quality of degraded images.
arXiv Detail & Related papers (2025-06-25T10:28:13Z) - NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers [10.84639914909133]
Flow-based Transformer models have achieved state-of-the-art image generation performance, but often suffer from high inference latency and computational cost.<n>We propose Bridged Progressive Rectified Flow Transformers (NAMI), which decompose the generation process across temporal, spatial, and architectural demensions.
arXiv Detail & Related papers (2025-03-12T10:38:58Z) - Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer [43.807338032286346]
We introduce an innovative non-homogeneous Dehazing method via Deformable Convolutional Transformer-like architecture (DehazeDCT)
We first design a transformer-like network based on deformable convolution v4, which offers long-range dependency and adaptive spatial aggregation capabilities.
Furthermore, we leverage a lightweight Retinex-inspired transformer to achieve color correction and structure refinement.
arXiv Detail & Related papers (2024-05-24T10:59:18Z) - SRTransGAN: Image Super-Resolution using Transformer based Generative
Adversarial Network [16.243363392717434]
We propose a transformer-based encoder-decoder network as a generator to generate 2x images and 4x images.
The proposed SRTransGAN outperforms the existing methods by 4.38 % on an average of PSNR and SSIM scores.
arXiv Detail & Related papers (2023-12-04T16:22:39Z) - DA-TransUNet: Integrating Spatial and Channel Dual Attention with
Transformer U-Net for Medical Image Segmentation [5.5582646801199225]
This study proposes a novel deep medical image segmentation framework, called DA-TransUNet.
It aims to integrate the Transformer and dual attention block(DA-Block) into the traditional U-shaped architecture.
Unlike earlier transformer-based U-net models, DA-TransUNet utilizes Transformers and DA-Block to integrate not only global and local features, but also image-specific positional and channel features.
arXiv Detail & Related papers (2023-10-19T08:25:03Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - Structural Prior Guided Generative Adversarial Transformers for
Low-Light Image Enhancement [51.22694467126883]
We propose an effective Structural Prior guided Generative Adversarial Transformer (SPGAT) to solve low-light image enhancement.
The generator is based on a U-shaped Transformer which is used to explore non-local information for better clear image restoration.
To generate more realistic images, we develop a new structural prior guided adversarial learning method by building the skip connections between the generator and discriminators.
arXiv Detail & Related papers (2022-07-16T04:05:40Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.