Related papers: ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

URL: http://arxiv.org/abs/2407.11496v1
Date: Tue, 16 Jul 2024 08:33:55 GMT
Title: ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment
Authors: Xinyi Wang, Angeliki Katsenou, David Bull,
Abstract summary: We propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model. ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception. We will open source the code and trained models to facilitate further research and applications of NR-VQA.
Score: 35.00766551093652
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be used. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the diversity of video content and the assessment of its quality without reference videos. ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features (from Residual Networks and Vision Transformers). Extensive testing on four UGC datasets confirms that ReLaX-VQA outperforms existing NR-VQA methods with an average SRCC value of 0.8658 and PLCC value of 0.8872. We will open source the code and trained models to facilitate further research and applications of NR-VQA: https://github.com/xinyiW915/ReLaX-VQA.

Related papers

CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video [9.172799792564009]
We propose CAMP-VQA, a novel NR-VQA framework that exploits the semantic understanding capabilities of large models.<n>Our approach introduces a quality-aware video metadata mechanism that integrates key fragments extracted from inter-frame variations.<n>Our model consistently outperforms existing NR-VQA methods, achieving improved accuracy without the need for costly manual fine-grained annotations.
arXiv Detail & Related papers (2025-11-10T16:37:47Z)
DIVA-VQA: Detecting Inter-frame Variations in UGC Video Quality [35.00766551093652]
No-reference (NR) quality assessment (VQA) is a key component for large-scale video quality monitoring.<n>This paper proposes a novel NR-VQA model based on fragmentation driven by inter-frame variations.<n>It integrates frames, fragmented-temporals, and fragmented frames aligned with residuals to effectively capture global and local information.
arXiv Detail & Related papers (2025-08-14T12:47:42Z)
CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA) The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z)
Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features. Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z)
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment [9.883856205077022]
Video Quality Assessment (VQA) aims to predict the perceptual quality of a video. VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos. We propose textitVisual Quality Transformer (VQT) to extract quality-related sparse features more efficiently.
arXiv Detail & Related papers (2023-07-31T16:29:29Z)
MRET: Multi-resolution Transformer for Video Quality Assessment [37.355412115794195]
No-reference video quality assessment (NR-VQA) for user generated content (UGC) is crucial for understanding and improving visual experience. Since large amounts of videos nowadays are 720p or above, the fixed and relatively small input used in conventional NR-VQA methods results in missing high-frequency details for many videos. We propose a novel Transformer-based NR-VQA framework that preserves the high-resolution quality information.
arXiv Detail & Related papers (2023-03-13T21:48:49Z)
Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA) In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments. With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z)
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling [54.31355080688127]
Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos. We propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution. We build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs. FAST-VQA improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
arXiv Detail & Related papers (2022-07-06T11:11:43Z)
CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z)
A Deep Learning based No-reference Quality Assessment Model for UGC Videos [44.00578772367465]
Previous video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of videos for quality regression. We propose a very simple but effective VQA model, which trains an end-to-end spatial feature extraction network to learn the quality-aware spatial feature representation from raw pixels of the video frames. With the better quality-aware features, we only use the simple multilayer perception layer (MLP) network to regress them into the chunk-level quality scores, and then the temporal average pooling strategy is adopted to obtain the video
arXiv Detail & Related papers (2022-04-29T12:45:21Z)
Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z)
Deep Learning based Full-reference and No-reference Quality Assessment Models for Compressed UGC Videos [34.761412637585266]
The framework consists of three modules, the feature extraction module, the quality regression module, and the quality pooling module. For the feature extraction module, we fuse the features from intermediate layers of the convolutional neural network (CNN) network into final quality-aware representation. For the quality regression module, we use the fully connected (FC) layer to regress the quality-aware features into frame-level scores.
arXiv Detail & Related papers (2021-06-02T12:23:16Z)
Study on the Assessment of the Quality of Experience of Streaming Video [117.44028458220427]
In this paper, the influence of various objective factors on the subjective estimation of the QoE of streaming video is studied. The paper presents standard and handcrafted features, shows their correlation and p-Value of significance. We take SQoE-III database, so far the largest and most realistic of its kind.
arXiv Detail & Related papers (2020-12-08T18:46:09Z)
ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate Dependent Video Quality Prediction [63.749184706461826]
We study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients. GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models.
arXiv Detail & Related papers (2020-10-26T16:54:33Z)
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled. Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models. Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.