MRET: Multi-resolution Transformer for Video Quality Assessment
- URL: http://arxiv.org/abs/2303.07489v2
- Date: Wed, 29 Mar 2023 18:23:54 GMT
- Title: MRET: Multi-resolution Transformer for Video Quality Assessment
- Authors: Junjie Ke, Tianhao Zhang, Yilin Wang, Peyman Milanfar, Feng Yang
- Abstract summary: No-reference video quality assessment (NR-VQA) for user generated content (UGC) is crucial for understanding and improving visual experience.
Since large amounts of videos nowadays are 720p or above, the fixed and relatively small input used in conventional NR-VQA methods results in missing high-frequency details for many videos.
We propose a novel Transformer-based NR-VQA framework that preserves the high-resolution quality information.
- Score: 37.355412115794195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: No-reference video quality assessment (NR-VQA) for user generated content
(UGC) is crucial for understanding and improving visual experience. Unlike
video recognition tasks, VQA tasks are sensitive to changes in input
resolution. Since large amounts of UGC videos nowadays are 720p or above, the
fixed and relatively small input used in conventional NR-VQA methods results in
missing high-frequency details for many videos. In this paper, we propose a
novel Transformer-based NR-VQA framework that preserves the high-resolution
quality information. With the multi-resolution input representation and a novel
multi-resolution patch sampling mechanism, our method enables a comprehensive
view of both the global video composition and local high-resolution details.
The proposed approach can effectively aggregate quality information across
different granularities in spatial and temporal dimensions, making the model
robust to input resolution variations. Our method achieves state-of-the-art
performance on large-scale UGC VQA datasets LSVQ and LSVQ-1080p, and on
KoNViD-1k and LIVE-VQC without fine-tuning.
Related papers
- LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models [53.64461404882853]
Video quality assessment (VQA) algorithms are needed to monitor and optimize the quality of streaming videos.
Here, we propose the first Large Multi-Modal Video Quality Assessment (LMM-VQA) model, which introduces a novel visual modeling strategy for quality-aware feature extraction.
arXiv Detail & Related papers (2024-08-26T04:29:52Z) - ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment [35.00766551093652]
We propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model.
ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception.
We will open source the code and trained models to facilitate further research and applications of NR-VQA.
arXiv Detail & Related papers (2024-07-16T08:33:55Z) - CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - Neighbourhood Representative Sampling for Efficient End-to-end Video
Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA)
In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments.
With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z) - FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment
Sampling [54.31355080688127]
Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos.
We propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution.
We build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs.
FAST-VQA improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
arXiv Detail & Related papers (2022-07-06T11:11:43Z) - A Deep Learning based No-reference Quality Assessment Model for UGC
Videos [44.00578772367465]
Previous video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of videos for quality regression.
We propose a very simple but effective VQA model, which trains an end-to-end spatial feature extraction network to learn the quality-aware spatial feature representation from raw pixels of the video frames.
With the better quality-aware features, we only use the simple multilayer perception layer (MLP) network to regress them into the chunk-level quality scores, and then the temporal average pooling strategy is adopted to obtain the video
arXiv Detail & Related papers (2022-04-29T12:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.