Visual Mechanisms Inspired Efficient Transformers for Image and Video
Quality Assessment
- URL: http://arxiv.org/abs/2203.14557v1
- Date: Mon, 28 Mar 2022 07:55:11 GMT
- Title: Visual Mechanisms Inspired Efficient Transformers for Image and Video
Quality Assessment
- Authors: Junyong You
- Abstract summary: Perceptual mechanisms in the human visual system play a crucial role in the generation of quality perception.
This paper proposes a general framework for no-reference visual quality assessment using efficient windowed transformer architectures.
- Score: 5.584060970507507
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Visual (image, video) quality assessments can be modelled by visual features
in different domains, e.g., spatial, frequency, and temporal domains.
Perceptual mechanisms in the human visual system (HVS) play a crucial role in
the generation of quality perception. This paper proposes a general framework
for no-reference visual quality assessment using efficient windowed transformer
architectures. A lightweight module for multi-stage channel attention is
integrated into the Swin (shifted window) Transformer. Such module can
represent the appropriate perceptual mechanisms in image quality assessment
(IQA) to build an accurate IQA model. Meanwhile, representative features for
image quality perception in the spatial and frequency domains can also be
derived from the IQA model, which are then fed into another windowed
transformer architecture for video quality assessment (VQA). The VQA model
efficiently reuses attention information across local windows to tackle the
issue of expensive time and memory complexities of original transformer.
Experimental results on both large-scale IQA and VQA databases demonstrate that
the proposed quality assessment models outperform other state-of-the-art models
by large margins. The complete source code will be published on Github.
Related papers
- Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - Large Multi-modality Model Assisted AI-Generated Image Quality Assessment [53.182136445844904]
We introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model.
It uses semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts.
It achieves state-of-the-art performance, and demonstrates its superior generalization capabilities on assessing the quality of AI-generated images.
arXiv Detail & Related papers (2024-04-27T02:40:36Z) - Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation.
Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles.
We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z) - Neighbourhood Representative Sampling for Efficient End-to-end Video
Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA)
In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments.
With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z) - DisCoVQA: Temporal Distortion-Content Transformers for Video Quality
Assessment [56.42140467085586]
Some temporal variations are causing temporal distortions and lead to extra quality degradations.
Human visual system often has different attention to frames with different contents.
We propose a novel and effective transformer-based VQA method to tackle these two issues.
arXiv Detail & Related papers (2022-06-20T15:31:27Z) - VTAMIQ: Transformers for Attention Modulated Image Quality Assessment [0.0]
We propose a novel full-reference IQA method, Vision Transformer for Attention Modulated Image Quality (VTAMIQ)
Our method achieves competitive or state-of-the-art performance on the existing IQA datasets.
With large-scale pre-training for both classification and IQA tasks, VTAMIQ generalizes well to unseen sets of images and distortions.
arXiv Detail & Related papers (2021-10-04T18:35:29Z) - No-Reference Image Quality Assessment via Transformers, Relative
Ranking, and Self-Consistency [38.88541492121366]
The goal of No-Reference Image Quality Assessment (NR-IQA) is to estimate the perceptual image quality in accordance with subjective evaluations.
We propose a novel model to address the NR-IQA task by leveraging a hybrid approach that benefits from Convolutional Neural Networks (CNNs) and self-attention mechanism in Transformers.
arXiv Detail & Related papers (2021-08-16T02:07:08Z) - MUSIQ: Multi-scale Image Quality Transformer [22.908901641767688]
Current state-of-the-art IQA methods are based on convolutional neural networks (CNNs)
We design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios.
With a multi-scale image representation, our proposed method can capture image quality at different granularities.
arXiv Detail & Related papers (2021-08-12T23:36:22Z) - Perceptual Image Quality Assessment with Transformers [4.005576542371173]
We propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment task.
We extract the perceptual feature representations from each of input images using a convolutional neural network backbone.
The proposed IQT was ranked first among 13 participants in the NTIRE 2021 perceptual image quality assessment challenge.
arXiv Detail & Related papers (2021-04-30T02:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.