Related papers: DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment

DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment

URL: http://arxiv.org/abs/2505.24002v1
Date: Thu, 29 May 2025 20:52:56 GMT
Title: DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment
Authors: Vaishnav Ramesh, Junliang Liu, Haining Wang, Md Jahidul Islam,
Abstract summary: A long-held challenge in no-reference image quality assessment is the lack of objective generalization to unseen natural distortions.<n>We integrate a novel Depth-Guided cross-attention and refinement mechanism, which distills scene depth and spatial features into a structure-aware representation.<n>We implement TCB and Depth-CAR as multimodal attention-based projection functions to select the most informative features.<n> Experimental results demonstrate that our proposed DGIQA model achieves state-of-the-art (SOTA) performance on both synthetic and authentic benchmark datasets.
Score: 9.851063768646847
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A long-held challenge in no-reference image quality assessment (NR-IQA) learning from human subjective perception is the lack of objective generalization to unseen natural distortions. To address this, we integrate a novel Depth-Guided cross-attention and refinement (Depth-CAR) mechanism, which distills scene depth and spatial features into a structure-aware representation for improved NR-IQA. This brings in the knowledge of object saliency and relative contrast of the scene for more discriminative feature learning. Additionally, we introduce the idea of TCB (Transformer-CNN Bridge) to fuse high-level global contextual dependencies from a transformer backbone with local spatial features captured by a set of hierarchical CNN (convolutional neural network) layers. We implement TCB and Depth-CAR as multimodal attention-based projection functions to select the most informative features, which also improve training time and inference efficiency. Experimental results demonstrate that our proposed DGIQA model achieves state-of-the-art (SOTA) performance on both synthetic and authentic benchmark datasets. More importantly, DGIQA outperforms SOTA models on cross-dataset evaluations as well as in assessing natural image distortions such as low-light effects, hazy conditions, and lens flares.

Related papers

Scene Perceived Image Perceptual Score (SPIPS): combining global and local perception for image quality assessment [0.0]
We propose a novel IQA approach that bridges the gap between deep learning methods and human perception.<n>Our model disentangles deep features into high-level semantic information and low-level perceptual details, treating each stream separately.<n>This hybrid design enables the model to assess both global context and intricate image details, better reflecting the human visual process.
arXiv Detail & Related papers (2025-04-24T04:06:07Z)
PIGUIQA: A Physical Imaging Guided Perceptual Framework for Underwater Image Quality Assessment [59.9103803198087]
We propose a Physical Imaging Guided perceptual framework for Underwater Image Quality Assessment (UIQA)<n>By leveraging underwater radiative transfer theory, we integrate physics-based imaging estimations to establish quantitative metrics for these distortions.<n>The proposed model accurately predicts image quality scores and achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-12-20T03:31:45Z)
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [54.139923409101044]
Blind image quality assessment (IQA) in the wild presents significant challenges. Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem. Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z)
Large Multi-modality Model Assisted AI-Generated Image Quality Assessment [53.182136445844904]
We introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model. It uses semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts. It achieves state-of-the-art performance, and demonstrates its superior generalization capabilities on assessing the quality of AI-generated images.
arXiv Detail & Related papers (2024-04-27T02:40:36Z)
Fiducial Focus Augmentation for Facial Landmark Detection [4.433764381081446]
We propose a novel image augmentation technique to enhance the model's understanding of facial structures. We employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss. Our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
arXiv Detail & Related papers (2024-02-23T01:34:00Z)
Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment [78.21609845377644]
We propose a novel class of state-of-the-art (SOTA) generative model, which exhibits the capability to model intricate relationships.<n>We devise a new diffusion restoration network that leverages the produced enhanced image and noise-containing images.<n>Two visual evaluation branches are designed to comprehensively analyze the obtained high-level feature information.
arXiv Detail & Related papers (2024-02-22T09:39:46Z)
Transformer-based No-Reference Image Quality Assessment via Supervised Contrastive Learning [36.695247860715874]
We propose a novel Contrastive Learning (SCL) and Transformer-based NR-IQA model SaTQA. We first train a model on a large-scale synthetic dataset by SCL to extract degradation features of images with various distortion types and levels. To further extract distortion information from images, we propose a backbone network incorporating the Multi-Stream Block (MSB) by combining the CNN inductive bias and Transformer long-term dependence modeling capability. Experimental results on seven standard IQA datasets show that SaTQA outperforms the state-of-the-art methods for both synthetic and authentic datasets
arXiv Detail & Related papers (2023-12-12T06:01:41Z)
Local Distortion Aware Efficient Transformer Adaptation for Image Quality Assessment [62.074473976962835]
We show that with proper injection of local distortion features, a larger pretrained and fixed foundation model performs better in IQA tasks. Specifically, for the lack of local distortion structure and inductive bias of vision transformer (ViT), we use another pretrained convolution neural network (CNN) We propose a local distortion extractor to obtain local distortion features from the pretrained CNN and a local distortion injector to inject the local distortion features into ViT.
arXiv Detail & Related papers (2023-08-23T08:41:21Z)
Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z)
Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.