Capturing Co-existing Distortions in User-Generated Content for
No-reference Video Quality Assessment
- URL: http://arxiv.org/abs/2307.16813v1
- Date: Mon, 31 Jul 2023 16:29:29 GMT
- Title: Capturing Co-existing Distortions in User-Generated Content for
No-reference Video Quality Assessment
- Authors: Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun, Xing Wen
- Abstract summary: Video Quality Assessment (VQA) aims to predict the perceptual quality of a video.
VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos.
We propose textitVisual Quality Transformer (VQT) to extract quality-related sparse features more efficiently.
- Score: 9.883856205077022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Quality Assessment (VQA), which aims to predict the perceptual quality
of a video, has attracted raising attention with the rapid development of
streaming media technology, such as Facebook, TikTok, Kwai, and so on. Compared
with other sequence-based visual tasks (\textit{e.g.,} action recognition), VQA
faces two under-estimated challenges unresolved in User Generated Content (UGC)
videos. \textit{First}, it is not rare that several frames containing serious
distortions (\textit{e.g.,}blocking, blurriness), can determine the perceptual
quality of the whole video, while other sequence-based tasks require more
frames of equal importance for representations. \textit{Second}, the perceptual
quality of a video exhibits a multi-distortion distribution, due to the
differences in the duration and probability of occurrence for various
distortions. In order to solve the above challenges, we propose \textit{Visual
Quality Transformer (VQT)} to extract quality-related sparse features more
efficiently. Methodologically, a Sparse Temporal Attention (STA) is proposed to
sample keyframes by analyzing the temporal correlation between frames, which
reduces the computational complexity from $O(T^2)$ to $O(T \log T)$.
Structurally, a Multi-Pathway Temporal Network (MPTN) utilizes multiple STA
modules with different degrees of sparsity in parallel, capturing co-existing
distortions in a video. Experimentally, VQT demonstrates superior performance
than many \textit{state-of-the-art} methods in three public no-reference VQA
datasets. Furthermore, VQT shows better performance in four full-reference VQA
datasets against widely-adopted industrial algorithms (\textit{i.e.,} VMAF and
AVQT).
Related papers
- CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Zoom-VQA: Patches, Frames and Clips Integration for Video Quality
Assessment [14.728530703277283]
Video assessment (VQA) aims to simulate the human perception of video quality.
We decompose video into three levels: patch level, frame level, and clip level.
We propose Zoom-VQA architecture to perceive features at different levels.
arXiv Detail & Related papers (2023-04-13T12:18:15Z) - Neighbourhood Representative Sampling for Efficient End-to-end Video
Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA)
In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments.
With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z) - Exploring the Effectiveness of Video Perceptual Representation in Blind
Video Quality Assessment [55.65173181828863]
We propose a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation.
Experiments show that TPQI is an effective way of predicting subjective temporal quality.
arXiv Detail & Related papers (2022-07-08T07:30:51Z) - FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment
Sampling [54.31355080688127]
Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos.
We propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution.
We build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs.
FAST-VQA improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
arXiv Detail & Related papers (2022-07-06T11:11:43Z) - DisCoVQA: Temporal Distortion-Content Transformers for Video Quality
Assessment [56.42140467085586]
Some temporal variations are causing temporal distortions and lead to extra quality degradations.
Human visual system often has different attention to frames with different contents.
We propose a novel and effective transformer-based VQA method to tackle these two issues.
arXiv Detail & Related papers (2022-06-20T15:31:27Z) - Structured Two-stream Attention Network for Video Question Answering [168.95603875458113]
We propose a Structured Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question.
First, we infer rich long-range temporal structures in videos using our structured segment component and encode text features.
Then, our structured two-stream attention component simultaneously localizes important visual instance, reduces the influence of background video and focuses on the relevant text.
arXiv Detail & Related papers (2022-06-02T12:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.