Related papers: StarVQA+: Co-training Space-Time Attention for Video Quality Assessment

StarVQA+: Co-training Space-Time Attention for Video Quality Assessment

URL: http://arxiv.org/abs/2306.12298v1
Date: Wed, 21 Jun 2023 14:27:31 GMT
Title: StarVQA+: Co-training Space-Time Attention for Video Quality Assessment
Authors: Fengchuang Xing, Yuan-Gen Wang, Weixuan Tang, Guopu Zhu, Sam Kwong
Abstract summary: Self-attention based Transformer has achieved great success in many computer vision tasks. However, its application to video quality assessment (VQA) has not been satisfactory so far. This paper presents a co-trained Space-Time Attention network for the VQA problem, termed StarVQA+.
Score: 56.548364244708715
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-attention based Transformer has achieved great success in many computer vision tasks. However, its application to video quality assessment (VQA) has not been satisfactory so far. Evaluating the quality of in-the-wild videos is challenging due to the unknown of pristine reference and shooting distortion. This paper presents a co-trained Space-Time Attention network for the VQA problem, termed StarVQA+. Specifically, we first build StarVQA+ by alternately concatenating the divided space-time attention. Then, to facilitate the training of StarVQA+, we design a vectorized regression loss by encoding the mean opinion score (MOS) to the probability vector and embedding a special token as the learnable variable of MOS, leading to better fitting of human's rating process. Finally, to solve the data hungry problem with Transformer, we propose to co-train the spatial and temporal attention weights using both images and videos. Various experiments are conducted on the de-facto in-the-wild video datasets, including LIVE-Qualcomm, LIVE-VQC, KoNViD-1k, YouTube-UGC, LSVQ, LSVQ-1080p, and DVL2021. Experimental results demonstrate the superiority of the proposed StarVQA+ over the state-of-the-art.

Related papers

ESVQA: Perceptual Quality Assessment of Egocentric Spatial Videos [71.62145804686062]
We introduce the first Egocentric Spatial Video Quality Assessment Database (ESVQAD), which comprises 600 egocentric spatial videos and their mean opinion scores (MOSs) We propose a novel multi-dimensional binocular feature fusion model, termed ESVQAnet, which integrates binocular spatial, motion, and semantic features to predict the perceptual quality. Experimental results demonstrate the ESVQAnet outperforms 16 state-of-the-art VQA models on the embodied perceptual quality assessment task.
arXiv Detail & Related papers (2024-12-29T10:13:30Z)
VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception. Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation. We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment. The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z)
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models [53.64461404882853]
Video quality assessment (VQA) algorithms are needed to monitor and optimize the quality of streaming videos. Here, we propose the first Large Multi-Modal Video Quality Assessment (LMM-VQA) model, which introduces a novel visual modeling strategy for quality-aware feature extraction.
arXiv Detail & Related papers (2024-08-26T04:29:52Z)
Contrastive Video Question Answering via Video Graph Transformer [184.3679515511028]
We propose a Video Graph Transformer model (CoVGT) to perform question answering (VideoQA) in a Contrastive manner. CoVGT's uniqueness and superiority are three-fold. We show that CoVGT can achieve much better performances than previous arts on video reasoning tasks.
arXiv Detail & Related papers (2023-02-27T11:09:13Z)
Disentangling Aesthetic and Technical Effects for Video Quality Assessment of User Generated Content [54.31355080688127]
The mechanisms of human quality perception in the YouTube-VQA problem is still yet to be explored. We propose a scheme where two separate evaluators are trained with views specifically designed for each issue. Our blind subjective studies prove that the separate evaluators in DOVER can effectively match human perception on respective disentangled quality issues.
arXiv Detail & Related papers (2022-11-09T13:55:50Z)
StarVQA: Space-Time Attention for Video Quality Assessment [28.3487798060932]
evaluating the quality of in-the-wild videos is challenging due to the unknown of pristine reference and shooting distortion. This paper presents a novel. underlinespace-underlinetime underlineattention network founderliner underlineVQA problem, named StarVQA.
arXiv Detail & Related papers (2021-08-22T04:53:02Z)
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled. Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models. Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.