Related papers: Audio-Visual Quality Assessment for User Generated Content: Database and Method

Audio-Visual Quality Assessment for User Generated Content: Database and Method

URL: http://arxiv.org/abs/2303.02392v2
Date: Wed, 27 Dec 2023 06:54:22 GMT
Title: Audio-Visual Quality Assessment for User Generated Content: Database and Method
Authors: Yuqin Cao, Xiongkuo Min, Wei Sun, Xiaoping Zhang, Guangtao Zhai
Abstract summary: Most existing VQA studies only focus on the visual distortions of videos, ignoring that the user's QoE also depends on the accompanying audio signals. We construct the first AVQA database named the SJTU-UAV database, which includes 520 in-the-wild audio and video (A/V) sequences. We also design a family of AVQA models, which fuse the popular VQA methods and audio features via support vector regressor (SVR) The experimental results show that with the help of audio signals, the VQA models can evaluate the quality more accurately.
Score: 61.970768267688086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the explosive increase of User Generated Content (UGC), UGC video quality assessment (VQA) becomes more and more important for improving users' Quality of Experience (QoE). However, most existing UGC VQA studies only focus on the visual distortions of videos, ignoring that the user's QoE also depends on the accompanying audio signals. In this paper, we conduct the first study to address the problem of UGC audio and video quality assessment (AVQA). Specifically, we construct the first UGC AVQA database named the SJTU-UAV database, which includes 520 in-the-wild UGC audio and video (A/V) sequences, and conduct a user study to obtain the mean opinion scores of the A/V sequences. The content of the SJTU-UAV database is then analyzed from both the audio and video aspects to show the database characteristics. We also design a family of AVQA models, which fuse the popular VQA methods and audio features via support vector regressor (SVR). We validate the effectiveness of the proposed models on the three databases. The experimental results show that with the help of audio signals, the VQA models can evaluate the perceptual quality more accurately. The database will be released to facilitate further research.

Related papers

FineVQ: Fine-Grained User Generated Content Video Quality Assessment [57.51274708410407]
We establish the first large-scale Fine-grained Video quality assessment Database, termed FineVD, which comprises 6104 videos with fine-grained quality scores and descriptions across multiple dimensions. We propose a Fine-grained Video Quality assessment model to learn the fine-grained quality of videos, with the capabilities of quality rating, quality scoring, and quality attribution. Our proposed FineVQ can produce fine-grained video-quality results and achieve state-of-the-art performance on FineVD and other commonly used-VQA datasets.
arXiv Detail & Related papers (2024-12-26T14:44:47Z)
Video Quality Assessment: A Comprehensive Survey [55.734935003021576]
Video quality assessment (VQA) is an important processing task, aiming at predicting the quality of videos in a manner consistent with human judgments of perceived quality. We present a survey of recent progress in the development of VQA algorithms and the benchmarking studies and databases that make them possible.
arXiv Detail & Related papers (2024-12-04T05:25:17Z)
VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception. Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation. We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment. The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z)
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model [54.69882562863726]
We try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. We evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment. We propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos.
arXiv Detail & Related papers (2024-07-31T07:54:26Z)
Perceptual Quality Assessment of Omnidirectional Audio-visual Signals [37.73157112698111]
Most existing quality assessment studies for omnidirectional videos (ODVs) only focus on the visual distortions of videos. In this paper, we first establish a large-scale audio-visual quality assessment dataset for ODVs. Then, we design three baseline methods for full-reference omnidirectional audio-visual quality assessment (OAVQA)
arXiv Detail & Related papers (2023-07-20T12:21:26Z)
StarVQA+: Co-training Space-Time Attention for Video Quality Assessment [56.548364244708715]
Self-attention based Transformer has achieved great success in many computer vision tasks. However, its application to video quality assessment (VQA) has not been satisfactory so far. This paper presents a co-trained Space-Time Attention network for the VQA problem, termed StarVQA+.
arXiv Detail & Related papers (2023-06-21T14:27:31Z)
MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos [39.06800945430703]
We build a first-of-a-kind subjective Live VQA database and develop an effective evaluation tool. textbfMD-VQA achieves state-of-the-art performance on both our Live VQA database and existing compressed VQA databases.
arXiv Detail & Related papers (2023-03-27T06:17:10Z)
Learning to Answer Questions in Dynamic Audio-Visual Scenarios [81.19017026999218]
We focus on the Audio-Visual Questioning (AVQA) task, which aims to answer questions regarding different visual objects sounds, and their associations in videos. Our dataset contains more than 45K question-answer pairs spanning over different modalities and question types. Our results demonstrate that AVQA benefits from multisensory perception and our model outperforms recent A-SIC, V-SIC, and AVQA approaches.
arXiv Detail & Related papers (2022-03-26T13:03:42Z)
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled. Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models. Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.