Towards Explainable In-the-Wild Video Quality Assessment: A Database and
a Language-Prompted Approach
- URL: http://arxiv.org/abs/2305.12726v2
- Date: Thu, 3 Aug 2023 09:26:36 GMT
- Title: Towards Explainable In-the-Wild Video Quality Assessment: A Database and
a Language-Prompted Approach
- Authors: Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan
Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
- Abstract summary: We collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors.
Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension.
These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings.
- Score: 52.07084862209754
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The proliferation of in-the-wild videos has greatly expanded the Video
Quality Assessment (VQA) problem. Unlike early definitions that usually focus
on limited distortion types, VQA on in-the-wild videos is especially
challenging as it could be affected by complicated factors, including various
distortions and diverse contents. Though subjective studies have collected
overall quality scores for these videos, how the abstract quality scores relate
with specific factors is still obscure, hindering VQA methods from more
concrete quality evaluations (e.g. sharpness of a video). To solve this
problem, we collect over two million opinions on 4,543 in-the-wild videos on 13
dimensions of quality-related factors, including in-capture authentic
distortions (e.g. motion blur, noise, flicker), errors introduced by
compression and transmission, and higher-level experiences on semantic contents
and aesthetic issues (e.g. composition, camera trajectory), to establish the
multi-dimensional Maxwell database. Specifically, we ask the subjects to label
among a positive, a negative, and a neutral choice for each dimension. These
explanation-level opinions allow us to measure the relationships between
specific quality factors and abstract subjective quality ratings, and to
benchmark different categories of VQA algorithms on each dimension, so as to
more comprehensively analyze their strengths and weaknesses. Furthermore, we
propose the MaxVQA, a language-prompted VQA approach that modifies
vision-language foundation model CLIP to better capture important quality
issues as observed in our analyses. The MaxVQA can jointly evaluate various
specific quality factors and final quality scores with state-of-the-art
accuracy on all dimensions, and superb generalization ability on existing
datasets. Code and data available at https://github.com/VQAssessment/MaxVQA.
Related papers
- CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - KVQ: Kwai Video Quality Assessment for Short-form Videos [24.5291786508361]
We establish the first large-scale Kaleidoscope short Video database for Quality assessment, KVQ, which comprises 600 user-uploaded short videos and 3600 processed videos.
We propose the first short-form video quality evaluator, i.e., KSVQE, which enables the quality evaluator to identify the quality-determined semantics with the content understanding of large vision language models.
arXiv Detail & Related papers (2024-02-11T14:37:54Z) - Perceptual Video Quality Assessment: A Survey [63.61214597655413]
Perceptual video quality assessment plays a vital role in the field of video processing.
Various subjective and objective video quality assessment studies have been conducted over the past two decades.
This survey provides an up-to-date and comprehensive review of these video quality assessment studies.
arXiv Detail & Related papers (2024-02-05T16:13:52Z) - Capturing Co-existing Distortions in User-Generated Content for
No-reference Video Quality Assessment [9.883856205077022]
Video Quality Assessment (VQA) aims to predict the perceptual quality of a video.
VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos.
We propose textitVisual Quality Transformer (VQT) to extract quality-related sparse features more efficiently.
arXiv Detail & Related papers (2023-07-31T16:29:29Z) - Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video
Quality Assessment [54.31355080688127]
We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP)
BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets.
We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
arXiv Detail & Related papers (2023-04-28T08:06:05Z) - Zoom-VQA: Patches, Frames and Clips Integration for Video Quality
Assessment [14.728530703277283]
Video assessment (VQA) aims to simulate the human perception of video quality.
We decompose video into three levels: patch level, frame level, and clip level.
We propose Zoom-VQA architecture to perceive features at different levels.
arXiv Detail & Related papers (2023-04-13T12:18:15Z) - Disentangling Aesthetic and Technical Effects for Video Quality
Assessment of User Generated Content [54.31355080688127]
The mechanisms of human quality perception in the YouTube-VQA problem is still yet to be explored.
We propose a scheme where two separate evaluators are trained with views specifically designed for each issue.
Our blind subjective studies prove that the separate evaluators in DOVER can effectively match human perception on respective disentangled quality issues.
arXiv Detail & Related papers (2022-11-09T13:55:50Z) - Blind VQA on 360{\deg} Video via Progressively Learning from Pixels,
Frames and Video [66.57045901742922]
Blind visual quality assessment (BVQA) on 360textdegree video plays a key role in optimizing immersive multimedia systems.
In this paper, we take into account the progressive paradigm of human perception towards spherical video quality.
We propose a novel BVQA approach (namely ProVQA) for 360textdegree video via progressively learning from pixels, frames and video.
arXiv Detail & Related papers (2021-11-18T03:45:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.