Exploring Opinion-unaware Video Quality Assessment with Semantic
Affinity Criterion
- URL: http://arxiv.org/abs/2302.13269v1
- Date: Sun, 26 Feb 2023 08:46:07 GMT
- Title: Exploring Opinion-unaware Video Quality Assessment with Semantic
Affinity Criterion
- Authors: Haoning Wu, Liang Liao, Jingwen Hou, Chaofeng Chen, Erli Zhang, Annan
Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
- Abstract summary: We introduce an explicit semantic affinity index for opinion-unaware VQA using text-prompts in the contrastive language-image pre-training model.
We also aggregate it with different traditional low-level naturalness indexes through gaussian normalization and sigmoid rescaling strategies.
The proposed Blind Unified Opinion-Unaware Video Quality Index via Semantic and Technical Metric Aggregation (BUONA-VISTA) outperforms existing opinion-unaware VQA methods by at least 20% improvements.
- Score: 52.07084862209754
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent learning-based video quality assessment (VQA) algorithms are expensive
to implement due to the cost of data collection of human quality opinions, and
are less robust across various scenarios due to the biases of these opinions.
This motivates our exploration on opinion-unaware (a.k.a zero-shot) VQA
approaches. Existing approaches only considers low-level naturalness in spatial
or temporal domain, without considering impacts from high-level semantics. In
this work, we introduce an explicit semantic affinity index for opinion-unaware
VQA using text-prompts in the contrastive language-image pre-training (CLIP)
model. We also aggregate it with different traditional low-level naturalness
indexes through gaussian normalization and sigmoid rescaling strategies.
Composed of aggregated semantic and technical metrics, the proposed Blind
Unified Opinion-Unaware Video Quality Index via Semantic and Technical Metric
Aggregation (BUONA-VISTA) outperforms existing opinion-unaware VQA methods by
at least 20% improvements, and is more robust than opinion-aware approaches.
Related papers
- CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment [21.9149920194746]
We propose a novel language-driven PCQA method named CLIP-PCQA.
Considering that human beings prefer to describe visual quality using discrete quality descriptions, we adopt a retrieval-based mapping strategy.
We show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches.
arXiv Detail & Related papers (2025-01-17T09:43:14Z) - Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment [15.529169236891532]
We introduce MSA-VQA, a Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment.
Our hierarchical framework analyzes video content at three levels: frame, segment, and video.
We propose a Prompt Semantic Supervision Module using text encoder of CLIP to ensure semantic consistency between videos and conditional prompts.
arXiv Detail & Related papers (2025-01-06T01:18:11Z) - CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions [75.45274978665684]
Vision-Language Understanding (VLU) benchmarks contain samples where answers rely on assumptions unsupported by the provided context.
We collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions.
We develop a general-purpose Context-AwaRe Abstention detector to identify samples lacking sufficient context and enhance model accuracy.
arXiv Detail & Related papers (2024-05-18T02:21:32Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity [55.399230250413986]
We propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) to remove harmful semantic noise features from the upstream task.
Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets.
arXiv Detail & Related papers (2023-12-11T06:50:27Z) - Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video
Quality Assessment [54.31355080688127]
We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP)
BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets.
We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
arXiv Detail & Related papers (2023-04-28T08:06:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.