Local Distortion Aware Efficient Transformer Adaptation for Image
Quality Assessment
- URL: http://arxiv.org/abs/2308.12001v1
- Date: Wed, 23 Aug 2023 08:41:21 GMT
- Title: Local Distortion Aware Efficient Transformer Adaptation for Image
Quality Assessment
- Authors: Kangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, Haoning Wu, Qiong
Yan, Weisi Lin
- Abstract summary: We show that with proper injection of local distortion features, a larger pretrained and fixed foundation model performs better in IQA tasks.
Specifically, for the lack of local distortion structure and inductive bias of vision transformer (ViT), we use another pretrained convolution neural network (CNN)
We propose a local distortion extractor to obtain local distortion features from the pretrained CNN and a local distortion injector to inject the local distortion features into ViT.
- Score: 62.074473976962835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image Quality Assessment (IQA) constitutes a fundamental task within the
field of computer vision, yet it remains an unresolved challenge, owing to the
intricate distortion conditions, diverse image contents, and limited
availability of data. Recently, the community has witnessed the emergence of
numerous large-scale pretrained foundation models, which greatly benefit from
dramatically increased data and parameter capacities. However, it remains an
open problem whether the scaling law in high-level tasks is also applicable to
IQA task which is closely related to low-level clues. In this paper, we
demonstrate that with proper injection of local distortion features, a larger
pretrained and fixed foundation model performs better in IQA tasks.
Specifically, for the lack of local distortion structure and inductive bias of
vision transformer (ViT), alongside the large-scale pretrained ViT, we use
another pretrained convolution neural network (CNN), which is well known for
capturing the local structure, to extract multi-scale image features. Further,
we propose a local distortion extractor to obtain local distortion features
from the pretrained CNN and a local distortion injector to inject the local
distortion features into ViT. By only training the extractor and injector, our
method can benefit from the rich knowledge in the powerful foundation models
and achieve state-of-the-art performance on popular IQA datasets, indicating
that IQA is not only a low-level problem but also benefits from stronger
high-level features drawn from large-scale pretrained models.
Related papers
- Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency [51.36674160287799]
We design a multi-branch deep neural network (DNN) to assess the quality of UHD images from three perspectives.
aesthetic features are extracted from low-resolution images downsampled from the UHD ones.
Technical distortions are measured using a fragment image composed of mini-patches cropped from UHD images.
The salient content of UHD images is detected and cropped to extract quality-aware features from the salient regions.
arXiv Detail & Related papers (2024-09-01T15:26:11Z) - DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [54.139923409101044]
Blind image quality assessment (IQA) in the wild presents significant challenges.
Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.
Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z) - Transformer-based No-Reference Image Quality Assessment via Supervised
Contrastive Learning [36.695247860715874]
We propose a novel Contrastive Learning (SCL) and Transformer-based NR-IQA model SaTQA.
We first train a model on a large-scale synthetic dataset by SCL to extract degradation features of images with various distortion types and levels.
To further extract distortion information from images, we propose a backbone network incorporating the Multi-Stream Block (MSB) by combining the CNN inductive bias and Transformer long-term dependence modeling capability.
Experimental results on seven standard IQA datasets show that SaTQA outperforms the state-of-the-art methods for both synthetic and authentic datasets
arXiv Detail & Related papers (2023-12-12T06:01:41Z) - TOPIQ: A Top-down Approach from Semantics to Distortions for Image
Quality Assessment [53.72721476803585]
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks.
We propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions.
A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features.
arXiv Detail & Related papers (2023-08-06T09:08:37Z) - Vision Transformer Equipped with Neural Resizer on Facial Expression
Recognition Task [1.3048920509133808]
We propose a novel training framework, Neural Resizer, to support Transformer by compensating information and downscaling in a data-driven manner.
Experiments show our Neural Resizer with F-PDLS loss function improves the performance with Transformer variants in general.
arXiv Detail & Related papers (2022-04-05T13:04:04Z) - Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features.
The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z) - No-Reference Image Quality Assessment via Transformers, Relative
Ranking, and Self-Consistency [38.88541492121366]
The goal of No-Reference Image Quality Assessment (NR-IQA) is to estimate the perceptual image quality in accordance with subjective evaluations.
We propose a novel model to address the NR-IQA task by leveraging a hybrid approach that benefits from Convolutional Neural Networks (CNNs) and self-attention mechanism in Transformers.
arXiv Detail & Related papers (2021-08-16T02:07:08Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z) - Domain-invariant Similarity Activation Map Contrastive Learning for
Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation.
And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.
Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset.
Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z) - MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment [73.55944459902041]
This paper presents a no-reference IQA metric based on deep meta-learning.
We first collect a number of NR-IQA tasks for different distortions.
Then meta-learning is adopted to learn the prior knowledge shared by diversified distortions.
Extensive experiments demonstrate that the proposed metric outperforms the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-04-11T23:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.