Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video
Quality Assessment
- URL: http://arxiv.org/abs/2304.14672v1
- Date: Fri, 28 Apr 2023 08:06:05 GMT
- Title: Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video
Quality Assessment
- Authors: Haoning Wu, Liang Liao, Annan Wang, Chaofeng Chen, Jingwen Hou, Wenxiu
Sun, Qiong Yan, Weisi Lin
- Abstract summary: We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP)
BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets.
We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
- Score: 54.31355080688127
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The proliferation of videos collected during in-the-wild natural settings has
pushed the development of effective Video Quality Assessment (VQA)
methodologies. Contemporary supervised opinion-driven VQA strategies
predominantly hinge on training from expensive human annotations for quality
scores, which limited the scale and distribution of VQA datasets and
consequently led to unsatisfactory generalization capacity of methods driven by
these data. On the other hand, although several handcrafted zero-shot quality
indices do not require training from human opinions, they are unable to account
for the semantics of videos, rendering them ineffective in comprehending
complex authentic distortions (e.g., white balance, exposure) and assessing the
quality of semantic content within videos. To address these challenges, we
introduce the text-prompted Semantic Affinity Quality Index (SAQI) and its
localized version (SAQI-Local) using Contrastive Language-Image Pre-training
(CLIP) to ascertain the affinity between textual prompts and visual features,
facilitating a comprehensive examination of semantic quality concerns without
the reliance on human quality annotations. By amalgamating SAQI with existing
low-level metrics, we propose the unified Blind Video Quality Index (BVQI) and
its improved version, BVQI-Local, which demonstrates unprecedented performance,
surpassing existing zero-shot indices by at least 24\% on all datasets.
Moreover, we devise an efficient fine-tuning scheme for BVQI-Local that jointly
optimizes text prompts and final fusion weights, resulting in state-of-the-art
performance and superior generalization ability in comparison to prevalent
opinion-driven VQA methods. We conduct comprehensive analyses to investigate
different quality concerns of distinct indices, demonstrating the effectiveness
and rationality of our design.
Related papers
- Advancing Video Quality Assessment for AIGC [17.23281750562252]
We propose a novel loss function that combines mean absolute error with cross-entropy loss to mitigate inter-frame quality inconsistencies.
We also introduce the innovative S2CNet technique to retain critical content, while leveraging adversarial training to enhance the model's generalization capabilities.
arXiv Detail & Related papers (2024-09-23T10:36:22Z) - CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Towards Explainable In-the-Wild Video Quality Assessment: A Database and
a Language-Prompted Approach [52.07084862209754]
We collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors.
Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension.
These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings.
arXiv Detail & Related papers (2023-05-22T05:20:23Z) - Exploring Opinion-unaware Video Quality Assessment with Semantic
Affinity Criterion [52.07084862209754]
We introduce an explicit semantic affinity index for opinion-unaware VQA using text-prompts in the contrastive language-image pre-training model.
We also aggregate it with different traditional low-level naturalness indexes through gaussian normalization and sigmoid rescaling strategies.
The proposed Blind Unified Opinion-Unaware Video Quality Index via Semantic and Technical Metric Aggregation (BUONA-VISTA) outperforms existing opinion-unaware VQA methods by at least 20% improvements.
arXiv Detail & Related papers (2023-02-26T08:46:07Z) - Blindly Assess Quality of In-the-Wild Videos via Quality-aware
Pre-training and Motion Perception [32.87570883484805]
We propose to transfer knowledge from image quality assessment (IQA) databases with authentic distortions and large-scale action recognition with rich motion patterns.
We train the proposed model on the target VQA databases using a mixed list-wise ranking loss function.
arXiv Detail & Related papers (2021-08-19T05:29:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.