Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment
- URL: http://arxiv.org/abs/2503.19295v1
- Date: Tue, 25 Mar 2025 02:50:47 GMT
- Title: Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment
- Authors: Guanglu Dong, Xiangyu Liao, Mingyang Li, Guihuan Guo, Chao Ren,
- Abstract summary: Generative Adversarial Networks (GANs) have been widely applied to image super-resolution (SR)<n>Most existing GAN-based SR methods perform coarse-grained discrimination directly on images and ignore the semantic information of images.<n>We propose a semantic feature discrimination method, SFD, for perceptual SR. Specifically, we first design a feature discriminator (Feat-D), to discriminate the pixel-wise middle semantic features from CLIP.<n>We also propose a novel opinion-unaware no-reference image quality assessment (OU NR-IQA) method, SFD-IQA, to
- Score: 6.362375130130533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Adversarial Networks (GANs) have been widely applied to image super-resolution (SR) to enhance the perceptual quality. However, most existing GAN-based SR methods typically perform coarse-grained discrimination directly on images and ignore the semantic information of images, making it challenging for the super resolution networks (SRN) to learn fine-grained and semantic-related texture details. To alleviate this issue, we propose a semantic feature discrimination method, SFD, for perceptual SR. Specifically, we first design a feature discriminator (Feat-D), to discriminate the pixel-wise middle semantic features from CLIP, aligning the feature distributions of SR images with that of high-quality images. Additionally, we propose a text-guided discrimination method (TG-D) by introducing learnable prompt pairs (LPP) in an adversarial manner to perform discrimination on the more abstract output feature of CLIP, further enhancing the discriminative ability of our method. With both Feat-D and TG-D, our SFD can effectively distinguish between the semantic feature distributions of low-quality and high-quality images, encouraging SRN to generate more realistic and semantic-relevant textures. Furthermore, based on the trained Feat-D and LPP, we propose a novel opinion-unaware no-reference image quality assessment (OU NR-IQA) method, SFD-IQA, greatly improving OU NR-IQA performance without any additional targeted training. Extensive experiments on classical SISR, real-world SISR, and OU NR-IQA tasks demonstrate the effectiveness of our proposed methods.
Related papers
- HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior [62.04939047885834]
We present HoliSDiP, a framework that leverages semantic segmentation to provide both precise textual and spatial guidance for Real-ISR.<n>Our method employs semantic labels as concise text prompts while introducing dense semantic guidance through segmentation masks and our proposed spatial-CLIP Map.
arXiv Detail & Related papers (2024-11-27T15:22:44Z) - DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [54.139923409101044]
Blind image quality assessment (IQA) in the wild presents significant challenges.
Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.
Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z) - SeD: Semantic-Aware Discriminator for Image Super-Resolution [20.646975821512395]
Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks.
One discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner.
We propose the simple and effective Semantic-aware Discriminator ( SeD)
SeD encourages the SR network to learn the fine-grained distributions by introducing the semantics of images as a condition.
arXiv Detail & Related papers (2024-02-29T17:38:54Z) - CiaoSR: Continuous Implicit Attention-in-Attention Network for
Arbitrary-Scale Image Super-Resolution [158.2282163651066]
This paper proposes a continuous implicit attention-in-attention network, called CiaoSR.
We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features.
We embed a scale-aware attention in this implicit attention network to exploit additional non-local information.
arXiv Detail & Related papers (2022-12-08T15:57:46Z) - Semantic Encoder Guided Generative Adversarial Face Ultra-Resolution
Network [15.102899995465041]
We propose a novel face super-resolution method, namely Semantic guided Generative Adversarial Face Ultra-Resolution Network (SEGA-FURN)
The proposed network is composed of a novel semantic encoder that has the ability to capture the embedded semantics to guide adversarial learning and a novel generator that uses a hierarchical architecture named Residual in Internal Block (RIDB)
Experiments on large face datasets have proved that the proposed method can achieve superior super-resolution results and significantly outperform other state-of-the-art methods in both qualitative and quantitative comparisons.
arXiv Detail & Related papers (2022-11-18T23:16:57Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z) - Deep Posterior Distribution-based Embedding for Hyperspectral Image
Super-resolution [75.24345439401166]
This paper focuses on how to embed the high-dimensional spatial-spectral information of hyperspectral (HS) images efficiently and effectively.
We formulate HS embedding as an approximation of the posterior distribution of a set of carefully-defined HS embedding events.
Then, we incorporate the proposed feature embedding scheme into a source-consistent super-resolution framework that is physically-interpretable.
Experiments over three common benchmark datasets demonstrate that PDE-Net achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-30T06:59:01Z) - Textural-Structural Joint Learning for No-Reference Super-Resolution
Image Quality Assessment [59.91741119995321]
We develop a dual stream network to jointly explore the textural and structural information for quality prediction, dubbed TSNet.
By mimicking the human vision system (HVS) that pays more attention to the significant areas of the image, we develop the spatial attention mechanism to make the visual-sensitive areas more distinguishable.
Experimental results show the proposed TSNet predicts the visual quality more accurate than the state-of-the-art IQA methods, and demonstrates better consistency with the human's perspective.
arXiv Detail & Related papers (2022-05-27T09:20:06Z) - Hierarchical Conditional Flow: A Unified Framework for Image
Super-Resolution and Image Rescaling [139.25215100378284]
We propose a hierarchical conditional flow (HCFlow) as a unified framework for image SR and image rescaling.
HCFlow learns a mapping between HR and LR image pairs by modelling the distribution of the LR image and the rest high-frequency component simultaneously.
To further enhance the performance, other losses such as perceptual loss and GAN loss are combined with the commonly used negative log-likelihood loss in training.
arXiv Detail & Related papers (2021-08-11T16:11:01Z) - Hierarchical Deep CNN Feature Set-Based Representation Learning for
Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z) - Blind Quality Assessment for Image Superresolution Using Deep Two-Stream
Convolutional Networks [41.558981828761574]
We propose a no-reference/blind deep neural network-based SR image quality assessor (DeepSRQ)
To learn more discriminative feature representations of various distorted SR images, the proposed DeepSRQ is a two-stream convolutional network.
Experimental results on three publicly available SR image quality databases demonstrate the effectiveness and generalization ability of our proposed DeepSRQ.
arXiv Detail & Related papers (2020-04-13T19:14:28Z) - Feature Super-Resolution Based Facial Expression Recognition for
Multi-scale Low-Resolution Faces [7.634398926381845]
Super-resolution method is often used to enhance low-resolution images, but the performance on FER task is limited when on images of very low resolution.
In this work, inspired by feature super-resolution methods for object detection, we proposed a novel generative adversary network-based super-resolution method for robust facial expression recognition.
arXiv Detail & Related papers (2020-04-05T15:38:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.