Exploring Opinion-unaware Video Quality Assessment with Semantic
Affinity Criterion
- URL: http://arxiv.org/abs/2302.13269v1
- Date: Sun, 26 Feb 2023 08:46:07 GMT
- Title: Exploring Opinion-unaware Video Quality Assessment with Semantic
Affinity Criterion
- Authors: Haoning Wu, Liang Liao, Jingwen Hou, Chaofeng Chen, Erli Zhang, Annan
Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
- Abstract summary: We introduce an explicit semantic affinity index for opinion-unaware VQA using text-prompts in the contrastive language-image pre-training model.
We also aggregate it with different traditional low-level naturalness indexes through gaussian normalization and sigmoid rescaling strategies.
The proposed Blind Unified Opinion-Unaware Video Quality Index via Semantic and Technical Metric Aggregation (BUONA-VISTA) outperforms existing opinion-unaware VQA methods by at least 20% improvements.
- Score: 52.07084862209754
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent learning-based video quality assessment (VQA) algorithms are expensive
to implement due to the cost of data collection of human quality opinions, and
are less robust across various scenarios due to the biases of these opinions.
This motivates our exploration on opinion-unaware (a.k.a zero-shot) VQA
approaches. Existing approaches only considers low-level naturalness in spatial
or temporal domain, without considering impacts from high-level semantics. In
this work, we introduce an explicit semantic affinity index for opinion-unaware
VQA using text-prompts in the contrastive language-image pre-training (CLIP)
model. We also aggregate it with different traditional low-level naturalness
indexes through gaussian normalization and sigmoid rescaling strategies.
Composed of aggregated semantic and technical metrics, the proposed Blind
Unified Opinion-Unaware Video Quality Index via Semantic and Technical Metric
Aggregation (BUONA-VISTA) outperforms existing opinion-unaware VQA methods by
at least 20% improvements, and is more robust than opinion-aware approaches.
Related papers
- CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions [75.45274978665684]
Vision-Language Understanding (VLU) benchmarks contain samples where answers rely on assumptions unsupported by the provided context.
We collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions.
We develop a general-purpose Context-AwaRe Abstention detector to identify samples lacking sufficient context and enhance model accuracy.
arXiv Detail & Related papers (2024-05-18T02:21:32Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy [27.454549324141087]
We propose a novel VQA benchmark based on well-known visual classification datasets.
We also suggest using the semantic hierarchy of the label space to ask automatically generated follow-up questions about the ground-truth category.
Our contributions aim to lay the foundation for more precise and meaningful assessments.
arXiv Detail & Related papers (2024-02-11T18:26:18Z) - Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity [55.399230250413986]
We propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) to remove harmful semantic noise features from the upstream task.
Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets.
arXiv Detail & Related papers (2023-12-11T06:50:27Z) - KNVQA: A Benchmark for evaluation knowledge-based VQA [8.602776661652083]
Large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems.
LVLMs are still plagued by the two critical issues of object hallucination and factual accuracy, which limit the practicality of LVLMs in different scenarios.
We propose a novel KNVQA-Eval, which is devoted to knowledge-based VQA task evaluation to reflect the factuality of multimodal LVLMs.
arXiv Detail & Related papers (2023-11-21T14:39:18Z) - Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video
Quality Assessment [54.31355080688127]
We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP)
BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets.
We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
arXiv Detail & Related papers (2023-04-28T08:06:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.