Related papers: EVQAScore: Efficient Video Question Answering Data Evaluation

EVQAScore: Efficient Video Question Answering Data Evaluation

URL: http://arxiv.org/abs/2411.06908v1
Date: Mon, 11 Nov 2024 12:11:36 GMT
Title: EVQAScore: Efficient Video Question Answering Data Evaluation
Authors: Hao Liang, Zirong Chen, Wentao Zhang,
Abstract summary: We introduce EVQAScore, a reference-free method that leverages keyword extraction to assess both video caption and video QA data quality. Our approach achieves state-of-the-art (SOTA) performance (32.8 for Kendall correlation and 42.3 for Spearman correlation, 4.7 and 5.9 higher than the previous method PAC-S++, for video caption evaluation) By using EVQAScore for data selection, we achieved SOTA results with only 12.5% of the original data volume, outperforming the previous SOTA method PAC-S and 100% of data.
Score: 23.812020049901452
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video question-answering (QA) is a core task in video understanding. Evaluating the quality of video QA and video caption data quality for training video large language models (VideoLLMs) is an essential challenge. Although various methods have been proposed for assessing video caption quality, there remains a lack of dedicated evaluation methods for Video QA. To address this gap, we introduce EVQAScore, a reference-free method that leverages keyword extraction to assess both video caption and video QA data quality. Additionally, we incorporate frame sampling and rescaling techniques to enhance the efficiency and robustness of our evaluation, this enables our score to evaluate the quality of extremely long videos. Our approach achieves state-of-the-art (SOTA) performance (32.8 for Kendall correlation and 42.3 for Spearman correlation, 4.7 and 5.9 higher than the previous method PAC-S++) on the VATEX-EVAL benchmark for video caption evaluation. Furthermore, by using EVQAScore for data selection, we achieved SOTA results with only 12.5\% of the original data volume, outperforming the previous SOTA method PAC-S and 100\% of data.

Related papers

LEHA-CVQAD: Dataset To Enable Generalized Video Quality Assessment of Compression Artifacts [44.95552843771737]
We propose the LEHA-QAD dataset, which comprises 6,240 clips for compression-oriented video quality assessment.<n>59 source videos are encoded with 186-preset variants, 1.8M pairwise, and 1.5k ratings are fused into a single quality scale.<n>We also propose Rate-Distortion Alignment Error (RDAE), a novel evaluation metric that quantifies how well VQA models preserve-quality ordering.
arXiv Detail & Related papers (2025-07-05T10:41:33Z)
VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception. Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation. We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment. The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z)
AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results [120.95863275142727]
This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos encoded with 14 codecs of various compression standards.
arXiv Detail & Related papers (2024-08-21T20:32:45Z)
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model [54.69882562863726]
We try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. We evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment. We propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos.
arXiv Detail & Related papers (2024-07-31T07:54:26Z)
Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy [23.61467796740852]
No-reference (NR) VQA methods play a vital role in situations where obtaining reference videos is restricted or not feasible. As more streaming videos are being created in ultra-high definition (e.g., 4K) to enrich viewers' experiences, the current deep VQA methods face unacceptable computational costs. In this paper, we propose a highly efficient and novel NR 4K VQA technology.
arXiv Detail & Related papers (2024-07-30T12:10:33Z)
CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA) The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z)
Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach [52.07084862209754]
We collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors. Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension. These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings.
arXiv Detail & Related papers (2023-05-22T05:20:23Z)
SB-VQA: A Stack-Based Video Quality Assessment Framework for Video Enhancement [0.40777876591043155]
We propose a stack-based framework for video quality assessment (VQA) that outperforms existing state-of-the-art methods on enhanced videos. In addition to proposing the VQA framework for enhanced videos, we also investigate its application on professionally generated content (PGC) Our experiments demonstrate that existing VQA algorithms can be applied to PGC videos, and we find that VQA performance for PGC videos can be improved by considering the plot of a play.
arXiv Detail & Related papers (2023-05-15T07:44:10Z)
Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment [54.31355080688127]
We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
arXiv Detail & Related papers (2023-04-28T08:06:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.