PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
- URL: http://arxiv.org/abs/2405.17765v1
- Date: Tue, 28 May 2024 02:37:29 GMT
- Title: PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
- Authors: Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun, Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang,
- Abstract summary: Video quality assessment (VQA) is a challenging problem due to the numerous factors that can affect the perceptual quality of a video.
Annotating the Mean opinion score (MOS) for videos is expensive and time-consuming, which limits the scale of VQA datasets.
We propose a VQA method named PTM-VQA, which leverages PreTrained Models to transfer knowledge from models pretrained on various pre-tasks.
- Score: 27.195339506769457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video quality assessment (VQA) is a challenging problem due to the numerous factors that can affect the perceptual quality of a video, \eg, content attractiveness, distortion type, motion pattern, and level. However, annotating the Mean opinion score (MOS) for videos is expensive and time-consuming, which limits the scale of VQA datasets, and poses a significant obstacle for deep learning-based methods. In this paper, we propose a VQA method named PTM-VQA, which leverages PreTrained Models to transfer knowledge from models pretrained on various pre-tasks, enabling benefits for VQA from different aspects. Specifically, we extract features of videos from different pretrained models with frozen weights and integrate them to generate representation. Since these models possess various fields of knowledge and are often trained with labels irrelevant to quality, we propose an Intra-Consistency and Inter-Divisibility (ICID) loss to impose constraints on features extracted by multiple pretrained models. The intra-consistency constraint ensures that features extracted by different pretrained models are in the same unified quality-aware latent space, while the inter-divisibility introduces pseudo clusters based on the annotation of samples and tries to separate features of samples from different clusters. Furthermore, with a constantly growing number of pretrained models, it is crucial to determine which models to use and how to use them. To address this problem, we propose an efficient scheme to select suitable candidates. Models with better clustering performance on VQA datasets are chosen to be our candidates. Extensive experiments demonstrate the effectiveness of the proposed method.
Related papers
- Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video
Quality Assessment [25.5501280406614]
Video quality assessment (VQA) has attracted growing attention in recent years.
The great expense of annotating large-scale VQA datasets has become the main obstacle for current deep-learning methods.
An Adaptive Diverse Quality-aware feature Acquisition (Ada-DQA) framework is proposed to capture desired quality-related features.
arXiv Detail & Related papers (2023-08-01T16:04:42Z) - Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models [71.06007696593704]
Blind quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in real-world video-enabled media applications.
As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets.
We conduct a first-of-its-kind computational analysis of VQA datasets via minimalistic BVQA models.
arXiv Detail & Related papers (2023-07-26T06:38:33Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - A Deep Learning based No-reference Quality Assessment Model for UGC
Videos [44.00578772367465]
Previous video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of videos for quality regression.
We propose a very simple but effective VQA model, which trains an end-to-end spatial feature extraction network to learn the quality-aware spatial feature representation from raw pixels of the video frames.
With the better quality-aware features, we only use the simple multilayer perception layer (MLP) network to regress them into the chunk-level quality scores, and then the temporal average pooling strategy is adopted to obtain the video
arXiv Detail & Related papers (2022-04-29T12:45:21Z) - Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets
Training [20.288424566444224]
We focus on automatically assessing the quality of in-the-wild videos in computer vision applications.
To improve the performance of quality assessment models, we borrow intuitions from human perception.
We propose a mixed datasets training strategy for training a single VQA model with multiple datasets.
arXiv Detail & Related papers (2020-11-09T09:22:57Z) - Counterfactual Samples Synthesizing for Robust Visual Question Answering [104.72828511083519]
We propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme.
CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions.
We achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.
arXiv Detail & Related papers (2020-03-14T08:34:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.