Related papers: Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

URL: http://arxiv.org/abs/2305.12726v2
Date: Thu, 3 Aug 2023 09:26:36 GMT
Title: Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach
Authors: Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
Abstract summary: We collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors. Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension. These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings.
Score: 52.07084862209754
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affected by complicated factors, including various distortions and diverse contents. Though subjective studies have collected overall quality scores for these videos, how the abstract quality scores relate with specific factors is still obscure, hindering VQA methods from more concrete quality evaluations (e.g. sharpness of a video). To solve this problem, we collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors, including in-capture authentic distortions (e.g. motion blur, noise, flicker), errors introduced by compression and transmission, and higher-level experiences on semantic contents and aesthetic issues (e.g. composition, camera trajectory), to establish the multi-dimensional Maxwell database. Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension. These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings, and to benchmark different categories of VQA algorithms on each dimension, so as to more comprehensively analyze their strengths and weaknesses. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies vision-language foundation model CLIP to better capture important quality issues as observed in our analyses. The MaxVQA can jointly evaluate various specific quality factors and final quality scores with state-of-the-art accuracy on all dimensions, and superb generalization ability on existing datasets. Code and data available at https://github.com/VQAssessment/MaxVQA.

Related papers

FineVQ: Fine-Grained User Generated Content Video Quality Assessment [57.51274708410407]
We establish the first large-scale Fine-grained Video quality assessment Database, termed FineVD, which comprises 6104 videos with fine-grained quality scores and descriptions across multiple dimensions. We propose a Fine-grained Video Quality assessment model to learn the fine-grained quality of videos, with the capabilities of quality rating, quality scoring, and quality attribution. Our proposed FineVQ can produce fine-grained video-quality results and achieve state-of-the-art performance on FineVD and other commonly used-VQA datasets.
arXiv Detail & Related papers (2024-12-26T14:44:47Z)
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation [21.797332686137203]
We introduce EVQAScore, a reference-free method that leverages keyword extraction to assess both video caption and video QA data quality. Our approach achieves state-of-the-art (SOTA) performance (32.8 for Kendall correlation and 42.3 for Spearman correlation, 4.7 and 5.9 higher than the previous method PAC-S++, for video caption evaluation) By using EVQAScore for data selection, we achieved SOTA results with only 12.5% of the original data volume, outperforming the previous SOTA method PAC-S and 100% of data.
arXiv Detail & Related papers (2024-11-11T12:11:36Z)
VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception. Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation. We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment. The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z)
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model [54.69882562863726]
We try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. We evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment. We propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos.
arXiv Detail & Related papers (2024-07-31T07:54:26Z)
CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA) The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z)
KVQ: Kwai Video Quality Assessment for Short-form Videos [24.5291786508361]
We establish the first large-scale Kaleidoscope short Video database for Quality assessment, KVQ, which comprises 600 user-uploaded short videos and 3600 processed videos. We propose the first short-form video quality evaluator, i.e., KSVQE, which enables the quality evaluator to identify the quality-determined semantics with the content understanding of large vision language models.
arXiv Detail & Related papers (2024-02-11T14:37:54Z)
Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment [54.31355080688127]
We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
arXiv Detail & Related papers (2023-04-28T08:06:05Z)
Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment [14.728530703277283]
Video assessment (VQA) aims to simulate the human perception of video quality. We decompose video into three levels: patch level, frame level, and clip level. We propose Zoom-VQA architecture to perceive features at different levels.
arXiv Detail & Related papers (2023-04-13T12:18:15Z)
Disentangling Aesthetic and Technical Effects for Video Quality Assessment of User Generated Content [54.31355080688127]
The mechanisms of human quality perception in the YouTube-VQA problem is still yet to be explored. We propose a scheme where two separate evaluators are trained with views specifically designed for each issue. Our blind subjective studies prove that the separate evaluators in DOVER can effectively match human perception on respective disentangled quality issues.
arXiv Detail & Related papers (2022-11-09T13:55:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.