CONVIQT: Contrastive Video Quality Estimator
- URL: http://arxiv.org/abs/2206.14713v1
- Date: Wed, 29 Jun 2022 15:22:01 GMT
- Title: CONVIQT: Contrastive Video Quality Estimator
- Authors: Pavan C. Madhusudana and Neil Birkbeck and Yilin Wang and Balu
Adsumilli and Alan C. Bovik
- Abstract summary: Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
- Score: 63.749184706461826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perceptual video quality assessment (VQA) is an integral component of many
streaming and video sharing platforms. Here we consider the problem of learning
perceptually relevant video quality representations in a self-supervised
manner. Distortion type identification and degradation level determination is
employed as an auxiliary task to train a deep learning model containing a deep
Convolutional Neural Network (CNN) that extracts spatial features, as well as a
recurrent unit that captures temporal information. The model is trained using a
contrastive loss and we therefore refer to this training framework and
resulting model as CONtrastive VIdeo Quality EstimaTor (CONVIQT). During
testing, the weights of the trained model are frozen, and a linear regressor
maps the learned features to quality scores in a no-reference (NR) setting. We
conduct comprehensive evaluations of the proposed model on multiple VQA
databases by analyzing the correlations between model predictions and
ground-truth quality ratings, and achieve competitive performance when compared
to state-of-the-art NR-VQA models, even though it is not trained on those
databases. Our ablation experiments demonstrate that the learned
representations are highly robust and generalize well across synthetic and
realistic distortions. Our results indicate that compelling representations
with perceptual bearing can be obtained using self-supervised learning. The
implementations used in this work have been made available at
https://github.com/pavancm/CONVIQT.
Related papers
- PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild [27.195339506769457]
Video quality assessment (VQA) is a challenging problem due to the numerous factors that can affect the perceptual quality of a video.
Annotating the Mean opinion score (MOS) for videos is expensive and time-consuming, which limits the scale of VQA datasets.
We propose a VQA method named PTM-VQA, which leverages PreTrained Models to transfer knowledge from models pretrained on various pre-tasks.
arXiv Detail & Related papers (2024-05-28T02:37:29Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference.
We propose a novel contrastive pre-training framework tailored for PCQA (CoPA)
Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z) - Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features.
The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - No-Reference Image Quality Assessment via Transformers, Relative
Ranking, and Self-Consistency [38.88541492121366]
The goal of No-Reference Image Quality Assessment (NR-IQA) is to estimate the perceptual image quality in accordance with subjective evaluations.
We propose a novel model to address the NR-IQA task by leveraging a hybrid approach that benefits from Convolutional Neural Networks (CNNs) and self-attention mechanism in Transformers.
arXiv Detail & Related papers (2021-08-16T02:07:08Z) - Study on the Assessment of the Quality of Experience of Streaming Video [117.44028458220427]
In this paper, the influence of various objective factors on the subjective estimation of the QoE of streaming video is studied.
The paper presents standard and handcrafted features, shows their correlation and p-Value of significance.
We take SQoE-III database, so far the largest and most realistic of its kind.
arXiv Detail & Related papers (2020-12-08T18:46:09Z) - Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets
Training [20.288424566444224]
We focus on automatically assessing the quality of in-the-wild videos in computer vision applications.
To improve the performance of quality assessment models, we borrow intuitions from human perception.
We propose a mixed datasets training strategy for training a single VQA model with multiple datasets.
arXiv Detail & Related papers (2020-11-09T09:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.