Neighbourhood Representative Sampling for Efficient End-to-end Video
Quality Assessment
- URL: http://arxiv.org/abs/2210.05357v1
- Date: Tue, 11 Oct 2022 11:38:07 GMT
- Title: Neighbourhood Representative Sampling for Efficient End-to-end Video
Quality Assessment
- Authors: Haoning Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong
Yan, Jinwei Gu, Weisi Lin
- Abstract summary: The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA)
In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments.
With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
- Score: 60.57703721744873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increased resolution of real-world videos presents a dilemma between
efficiency and accuracy for deep Video Quality Assessment (VQA). On the one
hand, keeping the original resolution will lead to unacceptable computational
costs. On the other hand, existing practices, such as resizing and cropping,
will change the quality of original videos due to the loss of details and
contents, and are therefore harmful to quality assessment. With the obtained
insight from the study of spatial-temporal redundancy in the human visual
system and visual coding theory, we observe that quality information around a
neighbourhood is typically similar, motivating us to investigate an effective
quality-sensitive neighbourhood representatives scheme for VQA. In this work,
we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS)
to get a novel type of sample, named fragments. Full-resolution videos are
first divided into mini-cubes with preset spatial-temporal grids, then the
temporal-aligned quality representatives are sampled to compose the fragments
that serve as inputs for VQA. In addition, we design the Fragment Attention
Network (FANet), a network architecture tailored specifically for fragments.
With fragments and FANet, the proposed efficient end-to-end FAST-VQA and
FasterVQA achieve significantly better performance than existing approaches on
all VQA benchmarks while requiring only 1/1612 FLOPs compared to the current
state-of-the-art. Codes, models and demos are available at
https://github.com/timothyhtimothy/FAST-VQA-and-FasterVQA.
Related papers
- ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment [35.00766551093652]
We propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model.
ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception.
We will open source the code and trained models to facilitate further research and applications of NR-VQA.
arXiv Detail & Related papers (2024-07-16T08:33:55Z) - CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - Capturing Co-existing Distortions in User-Generated Content for
No-reference Video Quality Assessment [9.883856205077022]
Video Quality Assessment (VQA) aims to predict the perceptual quality of a video.
VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos.
We propose textitVisual Quality Transformer (VQT) to extract quality-related sparse features more efficiently.
arXiv Detail & Related papers (2023-07-31T16:29:29Z) - Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models [71.06007696593704]
Blind quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in real-world video-enabled media applications.
As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets.
We conduct a first-of-its-kind computational analysis of VQA datasets via minimalistic BVQA models.
arXiv Detail & Related papers (2023-07-26T06:38:33Z) - MRET: Multi-resolution Transformer for Video Quality Assessment [37.355412115794195]
No-reference video quality assessment (NR-VQA) for user generated content (UGC) is crucial for understanding and improving visual experience.
Since large amounts of videos nowadays are 720p or above, the fixed and relatively small input used in conventional NR-VQA methods results in missing high-frequency details for many videos.
We propose a novel Transformer-based NR-VQA framework that preserves the high-resolution quality information.
arXiv Detail & Related papers (2023-03-13T21:48:49Z) - FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment
Sampling [54.31355080688127]
Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos.
We propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution.
We build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs.
FAST-VQA improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
arXiv Detail & Related papers (2022-07-06T11:11:43Z) - STIP: A SpatioTemporal Information-Preserving and Perception-Augmented
Model for High-Resolution Video Prediction [78.129039340528]
We propose a Stemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems.
The proposed model aims to preserve thetemporal information for videos during the feature extraction and the state transitions.
Experimental results show that the proposed STIP can predict videos with more satisfactory visual quality compared with a variety of state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T09:49:04Z) - A Deep Learning based No-reference Quality Assessment Model for UGC
Videos [44.00578772367465]
Previous video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of videos for quality regression.
We propose a very simple but effective VQA model, which trains an end-to-end spatial feature extraction network to learn the quality-aware spatial feature representation from raw pixels of the video frames.
With the better quality-aware features, we only use the simple multilayer perception layer (MLP) network to regress them into the chunk-level quality scores, and then the temporal average pooling strategy is adopted to obtain the video
arXiv Detail & Related papers (2022-04-29T12:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.