Related papers: Global-Local Progressive Integration Network for Blind Image Quality Assessment

Global-Local Progressive Integration Network for Blind Image Quality Assessment

URL: http://arxiv.org/abs/2408.03885v1
Date: Wed, 7 Aug 2024 16:34:32 GMT
Title: Global-Local Progressive Integration Network for Blind Image Quality Assessment
Authors: Xiaoqi Wang, Yun Zhang,
Abstract summary: Vision transformers (ViTs) excel in computer vision for modeling long-term dependencies, yet face two key challenges for image quality assessment (IQA) We propose a Global-Local progressive INTegration network for IQA, called GlintIQA, to address these issues through three key components.
Score: 6.095342999639137
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision transformers (ViTs) excel in computer vision for modeling long-term dependencies, yet face two key challenges for image quality assessment (IQA): discarding fine details during patch embedding, and requiring extensive training data due to lack of inductive biases. In this study, we propose a Global-Local progressive INTegration network for IQA, called GlintIQA, to address these issues through three key components: 1) Hybrid feature extraction combines ViT-based global feature extractor (VGFE) and convolutional neural networks (CNNs)-based local feature extractor (CLFE) to capture global coarse-grained features and local fine-grained features, respectively. The incorporation of CNNs mitigates the patch-level information loss and inductive bias constraints inherent to ViT architectures. 2) Progressive feature integration leverages diverse kernel sizes in embedding to spatially align coarse- and fine-grained features, and progressively aggregate these features by interactively stacking channel-wise attention and spatial enhancement modules to build effective quality-aware representations. 3) Content similarity-based labeling approach is proposed that automatically assigns quality labels to images with diverse content based on subjective quality scores. This addresses the scarcity of labeled training data in synthetic datasets and bolsters model generalization. The experimental results demonstrate the efficacy of our approach, yielding 5.04% average SROCC gains on cross-authentic dataset evaluations. Moreover, our model and its counterpart pre-trained on the proposed dataset respectively exhibited 5.40% and 13.23% improvements on across-synthetic datasets evaluation. The codes and proposed dataset will be released at https://github.com/XiaoqiWang/GlintIQA.

Related papers

M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment [65.3860007085689]
M3-AGIQA is a comprehensive framework for AGI quality assessment. It includes a structured multi-round evaluation mechanism, where intermediate image descriptions are generated. Experiments conducted on multiple benchmark datasets demonstrate that M3-AGIQA achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-02-21T03:05:45Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding. Q-Ground combines large multi-modality models with detailed visual quality analysis. Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z)
DACB-Net: Dual Attention Guided Compact Bilinear Convolution Neural Network for Skin Disease Classification [3.3891365992104605]
This paper introduces the three-branch Dual Attention-Guided Compact Bilinear CNN (DACB-Net) A global branch compensates for lost discriminative features, generating Attention Heat Maps (AHM) for relevant cropped regions. The framework integrates data augmentation, transfer learning, and fine-tuning to tackle data imbalance to improve classification performance, and reduce computational costs.
arXiv Detail & Related papers (2024-07-03T18:32:54Z)
CMTNet: Convolutional Meets Transformer Network for Hyperspectral Images Classification [3.821081081400729]
Current convolutional neural networks (CNNs) focus on local features in hyperspectral data. Transformer framework excels at extracting global features from hyperspectral imagery. This research introduces the Convolutional Meet Transformer Network (CMTNet)
arXiv Detail & Related papers (2024-06-20T07:56:51Z)
DeepHeteroIoT: Deep Local and Global Learning over Heterogeneous IoT Sensor Data [9.531834233076934]
We propose a novel deep learning model that incorporates both Convolutional Neural Network and Bi-directional Gated Recurrent Unit to learn local and global features respectively. In particular, the model achieves an average absolute improvement of 3.37% in Accuracy and 2.85% in F1-Score across datasets.
arXiv Detail & Related papers (2024-03-29T06:24:07Z)
Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment [7.291687946822539]
A major drawback of state-of-the-art NR-IQA techniques is their reliance on a large number of human annotations. We enable the learning of low-level quality features to distortion types by introducing a novel quality-aware contrastive loss. We design zero-shot quality predictions from both pathways in a completely blind setting.
arXiv Detail & Related papers (2023-12-08T05:24:21Z)
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm. FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z)
DealMVC: Dual Contrastive Calibration for Multi-view Clustering [78.54355167448614]
We propose a novel Dual contrastive calibration network for Multi-View Clustering (DealMVC) We first design a fusion mechanism to obtain a global cross-view feature. Then, a global contrastive calibration loss is proposed by aligning the view feature similarity graph and the high-confidence pseudo-label graph. During the training procedure, the interacted cross-view feature is jointly optimized at both local and global levels.
arXiv Detail & Related papers (2023-08-17T14:14:28Z)
Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution [13.894645293832044]
Transformer-based models have shown competitive performance in remote sensing image super-resolution (RSISR) We propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR. Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages.
arXiv Detail & Related papers (2023-07-06T13:19:06Z)
DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions. We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images. Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z)
Hybrid Routing Transformer for Zero-Shot Learning [83.64532548391]
This paper presents a novel transformer encoder-decoder model, called hybrid routing transformer (HRT) We embed an active attention, which is constructed by both the bottom-up and the top-down dynamic routing pathways to generate the attribute-aligned visual feature. While in HRT decoder, we use static routing to calculate the correlation among the attribute-aligned visual features, the corresponding attribute semantics, and the class attribute vectors to generate the final class label predictions.
arXiv Detail & Related papers (2022-03-29T07:55:08Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Compound Frechet Inception Distance for Quality Assessment of GAN Created Images [7.628527132779575]
One notable application of GANs is developing fake human faces, also known as "deep fakes" Measuring the quality of the generated images is inherently subjective but attempts to objectify quality using standardized metrics have been made. We propose to improve the robustness of the evaluation process by integrating lower-level features to cover a wider array of visual defects.
arXiv Detail & Related papers (2021-06-16T06:53:27Z)
IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations. IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels. We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data. RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
Global Context-Aware Progressive Aggregation Network for Salient Object Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features. We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.