StarVQA+: Co-training Space-Time Attention for Video Quality Assessment
- URL: http://arxiv.org/abs/2306.12298v1
- Date: Wed, 21 Jun 2023 14:27:31 GMT
- Title: StarVQA+: Co-training Space-Time Attention for Video Quality Assessment
- Authors: Fengchuang Xing, Yuan-Gen Wang, Weixuan Tang, Guopu Zhu, Sam Kwong
- Abstract summary: Self-attention based Transformer has achieved great success in many computer vision tasks.
However, its application to video quality assessment (VQA) has not been satisfactory so far.
This paper presents a co-trained Space-Time Attention network for the VQA problem, termed StarVQA+.
- Score: 56.548364244708715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-attention based Transformer has achieved great success in many computer
vision tasks. However, its application to video quality assessment (VQA) has
not been satisfactory so far. Evaluating the quality of in-the-wild videos is
challenging due to the unknown of pristine reference and shooting distortion.
This paper presents a co-trained Space-Time Attention network for the VQA
problem, termed StarVQA+. Specifically, we first build StarVQA+ by alternately
concatenating the divided space-time attention. Then, to facilitate the
training of StarVQA+, we design a vectorized regression loss by encoding the
mean opinion score (MOS) to the probability vector and embedding a special
token as the learnable variable of MOS, leading to better fitting of human's
rating process. Finally, to solve the data hungry problem with Transformer, we
propose to co-train the spatial and temporal attention weights using both
images and videos. Various experiments are conducted on the de-facto
in-the-wild video datasets, including LIVE-Qualcomm, LIVE-VQC, KoNViD-1k,
YouTube-UGC, LSVQ, LSVQ-1080p, and DVL2021. Experimental results demonstrate
the superiority of the proposed StarVQA+ over the state-of-the-art.
Related papers
- Towards Explainable In-the-Wild Video Quality Assessment: A Database and
a Language-Prompted Approach [52.07084862209754]
We collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors.
Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension.
These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings.
arXiv Detail & Related papers (2023-05-22T05:20:23Z) - MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos [39.06800945430703]
We build a first-of-a-kind subjective Live VQA database and develop an effective evaluation tool.
textbfMD-VQA achieves state-of-the-art performance on both our Live VQA database and existing compressed VQA databases.
arXiv Detail & Related papers (2023-03-27T06:17:10Z) - Contrastive Video Question Answering via Video Graph Transformer [184.3679515511028]
We propose a Video Graph Transformer model (CoVGT) to perform question answering (VideoQA) in a Contrastive manner.
CoVGT's uniqueness and superiority are three-fold.
We show that CoVGT can achieve much better performances than previous arts on video reasoning tasks.
arXiv Detail & Related papers (2023-02-27T11:09:13Z) - Disentangling Aesthetic and Technical Effects for Video Quality
Assessment of User Generated Content [54.31355080688127]
The mechanisms of human quality perception in the YouTube-VQA problem is still yet to be explored.
We propose a scheme where two separate evaluators are trained with views specifically designed for each issue.
Our blind subjective studies prove that the separate evaluators in DOVER can effectively match human perception on respective disentangled quality issues.
arXiv Detail & Related papers (2022-11-09T13:55:50Z) - FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment
Sampling [54.31355080688127]
Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos.
We propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution.
We build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs.
FAST-VQA improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
arXiv Detail & Related papers (2022-07-06T11:11:43Z) - StarVQA: Space-Time Attention for Video Quality Assessment [28.3487798060932]
evaluating the quality of in-the-wild videos is challenging due to the unknown of pristine reference and shooting distortion.
This paper presents a novel.
underlinespace-underlinetime underlineattention network founderliner underlineVQA problem, named StarVQA.
arXiv Detail & Related papers (2021-08-22T04:53:02Z) - End-to-End Video Question-Answer Generation with Generator-Pretester
Network [27.31969951281815]
We study a novel task, Video Question-Answer Generation (VQAG) for challenging Video Question Answering (Video QA) task in multimedia.
As captions neither fully represent a video, nor are they always practically available, it is crucial to generate question-answer pairs based on a video via Video Question-Answer Generation (VQAG)
We evaluate our system with the only two available large-scale human-annotated Video QA datasets and achieves state-of-the-art question generation performances.
arXiv Detail & Related papers (2021-01-05T10:46:06Z) - UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated
Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled.
Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models.
By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models.
Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.