VTAMIQ: Transformers for Attention Modulated Image Quality Assessment
- URL: http://arxiv.org/abs/2110.01655v1
- Date: Mon, 4 Oct 2021 18:35:29 GMT
- Title: VTAMIQ: Transformers for Attention Modulated Image Quality Assessment
- Authors: Andrei Chubarau, James Clark
- Abstract summary: We propose a novel full-reference IQA method, Vision Transformer for Attention Modulated Image Quality (VTAMIQ)
Our method achieves competitive or state-of-the-art performance on the existing IQA datasets.
With large-scale pre-training for both classification and IQA tasks, VTAMIQ generalizes well to unseen sets of images and distortions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Following the major successes of self-attention and Transformers for image
analysis, we investigate the use of such attention mechanisms in the context of
Image Quality Assessment (IQA) and propose a novel full-reference IQA method,
Vision Transformer for Attention Modulated Image Quality (VTAMIQ). Our method
achieves competitive or state-of-the-art performance on the existing IQA
datasets and significantly outperforms previous metrics in cross-database
evaluations. Most patch-wise IQA methods treat each patch independently; this
partially discards global information and limits the ability to model
long-distance interactions. We avoid this problem altogether by employing a
transformer to encode a sequence of patches as a single global representation,
which by design considers interdependencies between patches. We rely on various
attention mechanisms -- first with self-attention within the Transformer, and
second with channel attention within our difference modulation network --
specifically to reveal and enhance the more salient features throughout our
architecture. With large-scale pre-training for both classification and IQA
tasks, VTAMIQ generalizes well to unseen sets of images and distortions,
further demonstrating the strength of transformer-based networks for vision
modelling.
Related papers
- Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Blind Image Quality Assessment via Transformer Predicted Error Map and
Perceptual Quality Token [19.67014524146261]
No-reference image quality assessment (NR-IQA) has gained increasing attention recently.
We propose a Transformer based NR-IQA model using a predicted objective error map and perceptual quality token.
Our proposed method outperforms the current state-of-the-art in both authentic and synthetic image databases.
arXiv Detail & Related papers (2023-05-16T11:17:54Z) - Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation.
Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles.
We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z) - Visual Mechanisms Inspired Efficient Transformers for Image and Video
Quality Assessment [5.584060970507507]
Perceptual mechanisms in the human visual system play a crucial role in the generation of quality perception.
This paper proposes a general framework for no-reference visual quality assessment using efficient windowed transformer architectures.
arXiv Detail & Related papers (2022-03-28T07:55:11Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - No-Reference Image Quality Assessment via Transformers, Relative
Ranking, and Self-Consistency [38.88541492121366]
The goal of No-Reference Image Quality Assessment (NR-IQA) is to estimate the perceptual image quality in accordance with subjective evaluations.
We propose a novel model to address the NR-IQA task by leveraging a hybrid approach that benefits from Convolutional Neural Networks (CNNs) and self-attention mechanism in Transformers.
arXiv Detail & Related papers (2021-08-16T02:07:08Z) - MUSIQ: Multi-scale Image Quality Transformer [22.908901641767688]
Current state-of-the-art IQA methods are based on convolutional neural networks (CNNs)
We design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios.
With a multi-scale image representation, our proposed method can capture image quality at different granularities.
arXiv Detail & Related papers (2021-08-12T23:36:22Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.