Related papers: Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion

Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion

URL: http://arxiv.org/abs/2302.13269v1
Date: Sun, 26 Feb 2023 08:46:07 GMT
Title: Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion
Authors: Haoning Wu, Liang Liao, Jingwen Hou, Chaofeng Chen, Erli Zhang, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
Abstract summary: We introduce an explicit semantic affinity index for opinion-unaware VQA using text-prompts in the contrastive language-image pre-training model. We also aggregate it with different traditional low-level naturalness indexes through gaussian normalization and sigmoid rescaling strategies. The proposed Blind Unified Opinion-Unaware Video Quality Index via Semantic and Technical Metric Aggregation (BUONA-VISTA) outperforms existing opinion-unaware VQA methods by at least 20% improvements.
Score: 52.07084862209754
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent learning-based video quality assessment (VQA) algorithms are expensive to implement due to the cost of data collection of human quality opinions, and are less robust across various scenarios due to the biases of these opinions. This motivates our exploration on opinion-unaware (a.k.a zero-shot) VQA approaches. Existing approaches only considers low-level naturalness in spatial or temporal domain, without considering impacts from high-level semantics. In this work, we introduce an explicit semantic affinity index for opinion-unaware VQA using text-prompts in the contrastive language-image pre-training (CLIP) model. We also aggregate it with different traditional low-level naturalness indexes through gaussian normalization and sigmoid rescaling strategies. Composed of aggregated semantic and technical metrics, the proposed Blind Unified Opinion-Unaware Video Quality Index via Semantic and Technical Metric Aggregation (BUONA-VISTA) outperforms existing opinion-unaware VQA methods by at least 20% improvements, and is more robust than opinion-aware approaches.

Related papers

$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction [80.57232374640911]
We propose a model-agnostic strategy called the Mask-And-Recover (MAR) MAR integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules. To better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model.
arXiv Detail & Related papers (2025-04-01T13:01:30Z)
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation. We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO) We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z)
CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment [21.9149920194746]
We propose a novel language-driven PCQA method named CLIP-PCQA. Considering that human beings prefer to describe visual quality using discrete quality descriptions, we adopt a retrieval-based mapping strategy. We show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches.
arXiv Detail & Related papers (2025-01-17T09:43:14Z)
Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment [15.529169236891532]
We introduce MSA-VQA, a Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment. Our hierarchical framework analyzes video content at three levels: frame, segment, and video. We propose a Prompt Semantic Supervision Module using text encoder of CLIP to ensure semantic consistency between videos and conditional prompts.
arXiv Detail & Related papers (2025-01-06T01:18:11Z)
CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA) The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z)
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions [75.45274978665684]
Vision-Language Understanding (VLU) benchmarks contain samples where answers rely on assumptions unsupported by the provided context. We collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions. We develop a general-purpose Context-AwaRe Abstention detector to identify samples lacking sufficient context and enhance model accuracy.
arXiv Detail & Related papers (2024-05-18T02:21:32Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy [27.454549324141087]
We propose a novel VQA benchmark based on well-known visual classification datasets. We also suggest using the semantic hierarchy of the label space to ask automatically generated follow-up questions about the ground-truth category. Our contributions aim to lay the foundation for more precise and meaningful assessments.
arXiv Detail & Related papers (2024-02-11T18:26:18Z)
Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity [55.399230250413986]
We propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) to remove harmful semantic noise features from the upstream task. Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets.
arXiv Detail & Related papers (2023-12-11T06:50:27Z)
KNVQA: A Benchmark for evaluation knowledge-based VQA [8.602776661652083]
Large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems. LVLMs are still plagued by the two critical issues of object hallucination and factual accuracy, which limit the practicality of LVLMs in different scenarios. We propose a novel KNVQA-Eval, which is devoted to knowledge-based VQA task evaluation to reflect the factuality of multimodal LVLMs.
arXiv Detail & Related papers (2023-11-21T14:39:18Z)
Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment [54.31355080688127]
We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
arXiv Detail & Related papers (2023-04-28T08:06:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.