Zoom-VQA: Patches, Frames and Clips Integration for Video Quality
Assessment
- URL: http://arxiv.org/abs/2304.06440v1
- Date: Thu, 13 Apr 2023 12:18:15 GMT
- Title: Zoom-VQA: Patches, Frames and Clips Integration for Video Quality
Assessment
- Authors: Kai Zhao, Kun Yuan, Ming Sun and Xing Wen
- Abstract summary: Video assessment (VQA) aims to simulate the human perception of video quality.
We decompose video into three levels: patch level, frame level, and clip level.
We propose Zoom-VQA architecture to perceive features at different levels.
- Score: 14.728530703277283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video quality assessment (VQA) aims to simulate the human perception of video
quality, which is influenced by factors ranging from low-level color and
texture details to high-level semantic content. To effectively model these
complicated quality-related factors, in this paper, we decompose video into
three levels (\ie, patch level, frame level, and clip level), and propose a
novel Zoom-VQA architecture to perceive spatio-temporal features at different
levels. It integrates three components: patch attention module, frame pyramid
alignment, and clip ensemble strategy, respectively for capturing
region-of-interest in the spatial dimension, multi-level information at
different feature levels, and distortions distributed over the temporal
dimension. Owing to the comprehensive design, Zoom-VQA obtains state-of-the-art
results on four VQA benchmarks and achieves 2nd place in the NTIRE 2023 VQA
challenge. Notably, Zoom-VQA has outperformed the previous best results on two
subsets of LSVQ, achieving 0.8860 (+1.0%) and 0.7985 (+1.9%) of SRCC on the
respective subsets. Adequate ablation studies further verify the effectiveness
of each component. Codes and models are released in
https://github.com/k-zha14/Zoom-VQA.
Related papers
- VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception.
Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation.
We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment.
The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z) - CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - Capturing Co-existing Distortions in User-Generated Content for
No-reference Video Quality Assessment [9.883856205077022]
Video Quality Assessment (VQA) aims to predict the perceptual quality of a video.
VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos.
We propose textitVisual Quality Transformer (VQT) to extract quality-related sparse features more efficiently.
arXiv Detail & Related papers (2023-07-31T16:29:29Z) - Towards Explainable In-the-Wild Video Quality Assessment: A Database and
a Language-Prompted Approach [52.07084862209754]
We collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors.
Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension.
These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings.
arXiv Detail & Related papers (2023-05-22T05:20:23Z) - Structured Two-stream Attention Network for Video Question Answering [168.95603875458113]
We propose a Structured Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question.
First, we infer rich long-range temporal structures in videos using our structured segment component and encode text features.
Then, our structured two-stream attention component simultaneously localizes important visual instance, reduces the influence of background video and focuses on the relevant text.
arXiv Detail & Related papers (2022-06-02T12:25:52Z) - Blind VQA on 360{\deg} Video via Progressively Learning from Pixels,
Frames and Video [66.57045901742922]
Blind visual quality assessment (BVQA) on 360textdegree video plays a key role in optimizing immersive multimedia systems.
In this paper, we take into account the progressive paradigm of human perception towards spherical video quality.
We propose a novel BVQA approach (namely ProVQA) for 360textdegree video via progressively learning from pixels, frames and video.
arXiv Detail & Related papers (2021-11-18T03:45:13Z) - Deep Learning based Full-reference and No-reference Quality Assessment
Models for Compressed UGC Videos [34.761412637585266]
The framework consists of three modules, the feature extraction module, the quality regression module, and the quality pooling module.
For the feature extraction module, we fuse the features from intermediate layers of the convolutional neural network (CNN) network into final quality-aware representation.
For the quality regression module, we use the fully connected (FC) layer to regress the quality-aware features into frame-level scores.
arXiv Detail & Related papers (2021-06-02T12:23:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.