FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment
Sampling
- URL: http://arxiv.org/abs/2207.02595v1
- Date: Wed, 6 Jul 2022 11:11:43 GMT
- Title: FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment
Sampling
- Authors: Haoning Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu
Sun, Qiong Yan, Weisi Lin
- Abstract summary: Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos.
We propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution.
We build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs.
FAST-VQA improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
- Score: 54.31355080688127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current deep video quality assessment (VQA) methods are usually with high
computational costs when evaluating high-resolution videos. This cost hinders
them from learning better video-quality-related representations via end-to-end
training. Existing approaches typically consider naive sampling to reduce the
computational cost, such as resizing and cropping. However, they obviously
corrupt quality-related information in videos and are thus not optimal for
learning good representations for VQA. Therefore, there is an eager need to
design a new quality-retained sampling scheme for VQA. In this paper, we
propose Grid Mini-patch Sampling (GMS), which allows consideration of local
quality by sampling patches at their raw resolution and covers global quality
with contextual relations via mini-patches sampled in uniform grids. These
mini-patches are spliced and aligned temporally, named as fragments. We further
build the Fragment Attention Network (FANet) specially designed to accommodate
fragments as inputs. Consisting of fragments and FANet, the proposed FrAgment
Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and
learns effective video-quality-related representations. It improves
state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P
high-resolution videos. The newly learned video-quality-related representations
can also be transferred into smaller VQA datasets, boosting performance in
these scenarios. Extensive experiments show that FAST-VQA has good performance
on inputs of various resolutions while retaining high efficiency. We publish
our code at https://github.com/timothyhtimothy/FAST-VQA.
Related papers
- EPS: Efficient Patch Sampling for Video Overfitting in Deep Super-Resolution Model Training [15.684865589513597]
We propose an efficient patch sampling method named EPS for video SR network overfitting.
Our method reduces the number of patches for the training to 4% to 25%, depending on the resolution and number of clusters.
Compared to the state-of-the-art patch sampling method, EMT, our approach achieves an 83% decrease in overall run time.
arXiv Detail & Related papers (2024-11-25T12:01:57Z) - CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - Capturing Co-existing Distortions in User-Generated Content for
No-reference Video Quality Assessment [9.883856205077022]
Video Quality Assessment (VQA) aims to predict the perceptual quality of a video.
VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos.
We propose textitVisual Quality Transformer (VQT) to extract quality-related sparse features more efficiently.
arXiv Detail & Related papers (2023-07-31T16:29:29Z) - MRET: Multi-resolution Transformer for Video Quality Assessment [37.355412115794195]
No-reference video quality assessment (NR-VQA) for user generated content (UGC) is crucial for understanding and improving visual experience.
Since large amounts of videos nowadays are 720p or above, the fixed and relatively small input used in conventional NR-VQA methods results in missing high-frequency details for many videos.
We propose a novel Transformer-based NR-VQA framework that preserves the high-resolution quality information.
arXiv Detail & Related papers (2023-03-13T21:48:49Z) - Neighbourhood Representative Sampling for Efficient End-to-end Video
Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA)
In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments.
With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z) - FAVER: Blind Quality Prediction of Variable Frame Rate Videos [47.951054608064126]
Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales.
We propose a first-of-a-kind blind VQA model for evaluating HFR videos, which we dub the Framerate-Aware Video Evaluator w/o Reference (FAVER)
Our experiments on several HFR video quality datasets show that FAVER outperforms other blind VQA algorithms at a reasonable computational cost.
arXiv Detail & Related papers (2022-01-05T07:54:12Z) - UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated
Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled.
Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models.
By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models.
Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.