ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment
- URL: http://arxiv.org/abs/2407.11496v3
- Date: Wed, 12 Mar 2025 18:07:16 GMT
- Title: ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment
- Authors: Xinyi Wang, Angeliki Katsenou, David Bull,
- Abstract summary: ReLaX-VQA is a novel No-Reference Video Quality Assessment (NRVQA) model.<n>It aims to address the challenges of evaluating the quality of diverse video content without reference to the original uncompressed videos.<n>It consistently outperforms existing NR-VQA methods, achieving an average S-Score of 0.8658 and PLCC of 0.8873.
- Score: 35.00766551093652
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild is increasingly evident. UGC is typically acquired using consumer devices and undergoes multiple rounds of compression (transcoding) before reaching the end user. Therefore, traditional quality metrics that employ the original content as a reference are not suitable. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the quality of diverse video content without reference to the original uncompressed videos. ReLaX-VQA uses frame differences to select spatio-temporal fragments intelligently together with different expressions of spatial features associated with the sampled frames. These are then used to better capture spatial and temporal variabilities in the quality of neighbouring frames. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features from Residual Networks and Vision Transformers. Extensive testing across four UGC datasets demonstrates that ReLaX-VQA consistently outperforms existing NR-VQA methods, achieving an average SRCC of 0.8658 and PLCC of 0.8873. Open-source code and trained models that will facilitate further research and applications of NR-VQA can be found at https://github.com/xinyiW915/ReLaX-VQA.
Related papers
- CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - MRET: Multi-resolution Transformer for Video Quality Assessment [37.355412115794195]
No-reference video quality assessment (NR-VQA) for user generated content (UGC) is crucial for understanding and improving visual experience.
Since large amounts of videos nowadays are 720p or above, the fixed and relatively small input used in conventional NR-VQA methods results in missing high-frequency details for many videos.
We propose a novel Transformer-based NR-VQA framework that preserves the high-resolution quality information.
arXiv Detail & Related papers (2023-03-13T21:48:49Z) - Neighbourhood Representative Sampling for Efficient End-to-end Video
Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA)
In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments.
With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z) - FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment
Sampling [54.31355080688127]
Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos.
We propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution.
We build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs.
FAST-VQA improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
arXiv Detail & Related papers (2022-07-06T11:11:43Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - A Deep Learning based No-reference Quality Assessment Model for UGC
Videos [44.00578772367465]
Previous video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of videos for quality regression.
We propose a very simple but effective VQA model, which trains an end-to-end spatial feature extraction network to learn the quality-aware spatial feature representation from raw pixels of the video frames.
With the better quality-aware features, we only use the simple multilayer perception layer (MLP) network to regress them into the chunk-level quality scores, and then the temporal average pooling strategy is adopted to obtain the video
arXiv Detail & Related papers (2022-04-29T12:45:21Z) - Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features.
The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z) - Deep Learning based Full-reference and No-reference Quality Assessment
Models for Compressed UGC Videos [34.761412637585266]
The framework consists of three modules, the feature extraction module, the quality regression module, and the quality pooling module.
For the feature extraction module, we fuse the features from intermediate layers of the convolutional neural network (CNN) network into final quality-aware representation.
For the quality regression module, we use the fully connected (FC) layer to regress the quality-aware features into frame-level scores.
arXiv Detail & Related papers (2021-06-02T12:23:16Z) - Study on the Assessment of the Quality of Experience of Streaming Video [117.44028458220427]
In this paper, the influence of various objective factors on the subjective estimation of the QoE of streaming video is studied.
The paper presents standard and handcrafted features, shows their correlation and p-Value of significance.
We take SQoE-III database, so far the largest and most realistic of its kind.
arXiv Detail & Related papers (2020-12-08T18:46:09Z) - ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate
Dependent Video Quality Prediction [63.749184706461826]
We study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality.
We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients.
GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models.
arXiv Detail & Related papers (2020-10-26T16:54:33Z) - UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated
Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled.
Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models.
By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models.
Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.