Related papers: DIVA-VQA: Detecting Inter-frame Variations in UGC Video Quality

DIVA-VQA: Detecting Inter-frame Variations in UGC Video Quality

URL: http://arxiv.org/abs/2508.10605v1
Date: Thu, 14 Aug 2025 12:47:42 GMT
Title: DIVA-VQA: Detecting Inter-frame Variations in UGC Video Quality
Authors: Xinyi Wang, Angeliki Katsenou, David Bull,
Abstract summary: No-reference (NR) quality assessment (VQA) is a key component for large-scale video quality monitoring.<n>This paper proposes a novel NR-VQA model based on fragmentation driven by inter-frame variations.<n>It integrates frames, fragmented-temporals, and fragmented frames aligned with residuals to effectively capture global and local information.
Score: 35.00766551093652
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The rapid growth of user-generated (video) content (UGC) has driven increased demand for research on no-reference (NR) perceptual video quality assessment (VQA). NR-VQA is a key component for large-scale video quality monitoring in social media and streaming applications where a pristine reference is not available. This paper proposes a novel NR-VQA model based on spatio-temporal fragmentation driven by inter-frame variations. By leveraging these inter-frame differences, the model progressively analyses quality-sensitive regions at multiple levels: frames, patches, and fragmented frames. It integrates frames, fragmented residuals, and fragmented frames aligned with residuals to effectively capture global and local information. The model extracts both 2D and 3D features in order to characterize these spatio-temporal variations. Experiments conducted on five UGC datasets and against state-of-the-art models ranked our proposed method among the top 2 in terms of average rank correlation (DIVA-VQA-L: 0.898 and DIVA-VQA-B: 0.886). The improved performance is offered at a low runtime complexity, with DIVA-VQA-B ranked top and DIVA-VQA-L third on average compared to the fastest existing NR-VQA method. Code and models are publicly available at: https://github.com/xinyiW915/DIVA-VQA.

Related papers

Q-Save: Towards Scoring and Attribution for Generated Video Evaluation [65.83319736145869]
We present Q-Save, a new benchmark dataset and model for holistic evaluation of AI-generated video (AIGV) quality.<n>The dataset contains near 10000 videos, each annotated with a scalar mean opinion score (MOS) and fine-grained attribution labels.<n>We propose a unified evaluation model that jointly performs quality scoring and attribution-based explanation.
arXiv Detail & Related papers (2025-11-24T07:00:21Z)
CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video [9.172799792564009]
We propose CAMP-VQA, a novel NR-VQA framework that exploits the semantic understanding capabilities of large models.<n>Our approach introduces a quality-aware video metadata mechanism that integrates key fragments extracted from inter-frame variations.<n>Our model consistently outperforms existing NR-VQA methods, achieving improved accuracy without the need for costly manual fine-grained annotations.
arXiv Detail & Related papers (2025-11-10T16:37:47Z)
NovisVQ: A Streaming Convolutional Neural Network for No-Reference Opinion-Unaware Frame Quality Assessment [39.76658525158528]
Video quality assessment (VQA) is vital for computer vision tasks, but existing approaches face major limitations.<n>We present a scalable, streaming-based VQA model that is both no-reference and opinion-unaware.
arXiv Detail & Related papers (2025-11-06T18:23:55Z)
VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning [50.34205095371895]
Video quality assessment aims to objectively quantify perceptual quality degradation.<n>Existing VQA models suffer from two critical limitations.<n>We propose textbfVQAThinker, a reasoning-based VQA framework.
arXiv Detail & Related papers (2025-08-08T06:16:23Z)
Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision [49.46606936180063]
Video quality assessment (VQA) is essential for quantifying quality in various video processing systems.<n>We introduce a self-supervised learning framework for VQA to learn quality assessment capabilities from large-scale, unlabeled web videos.<n>By training on a dataset $10times$ larger than the existing VQA benchmarks, our model achieves zero-shot performance.
arXiv Detail & Related papers (2025-05-06T15:29:32Z)
Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment [63.811519474030234]
We propose a perception-oriented approach to quantify frame-wise temporal inconsistency.<n>Inspired by the human visual system, we develop an Inconsistency Guided Temporal Module.<n>Our method significantly outperforms state-of-the-art VQA approaches.
arXiv Detail & Related papers (2024-12-25T15:43:41Z)
ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment [35.00766551093652]
ReLaX-VQA is a novel No-Reference Video Quality Assessment (NRVQA) model.<n>It aims to address the challenges of evaluating the quality of diverse video content without reference to the original uncompressed videos.<n>It consistently outperforms existing NR-VQA methods, achieving an average S-Score of 0.8658 and PLCC of 0.8873.
arXiv Detail & Related papers (2024-07-16T08:33:55Z)
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment [9.883856205077022]
Video Quality Assessment (VQA) aims to predict the perceptual quality of a video. VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos. We propose textitVisual Quality Transformer (VQT) to extract quality-related sparse features more efficiently.
arXiv Detail & Related papers (2023-07-31T16:29:29Z)
Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA) In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments. With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z)
ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate Dependent Video Quality Prediction [63.749184706461826]
We study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients. GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models.
arXiv Detail & Related papers (2020-10-26T16:54:33Z)
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled. Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models. Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.