Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap
- URL: http://arxiv.org/abs/2404.13573v2
- Date: Sat, 27 Apr 2024 15:10:55 GMT
- Title: Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap
- Authors: Bowen Qu, Xiaoyu Liang, Shangkun Sun, Wei Gao,
- Abstract summary: We categorize the assessment of AIGC video quality into three dimensions: visual harmony, video-text consistency, and domain distribution gap.
For each dimension, we design specific modules to provide a comprehensive quality assessment of AIGC videos.
Our research identifies significant variations in visual quality, fluidity, and style among videos generated by different text-to-video models.
- Score: 4.922783970210658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent advancements in Text-to-Video Artificial Intelligence Generated Content (AIGC) have been remarkable. Compared with traditional videos, the assessment of AIGC videos encounters various challenges: visual inconsistency that defy common sense, discrepancies between content and the textual prompt, and distribution gap between various generative models, etc. Target at these challenges, in this work, we categorize the assessment of AIGC video quality into three dimensions: visual harmony, video-text consistency, and domain distribution gap. For each dimension, we design specific modules to provide a comprehensive quality assessment of AIGC videos. Furthermore, our research identifies significant variations in visual quality, fluidity, and style among videos generated by different text-to-video models. Predicting the source generative model can make the AIGC video features more discriminative, which enhances the quality assessment performance. The proposed method was used in the third-place winner of the NTIRE 2024 Quality Assessment for AI-Generated Content - Track 2 Video, demonstrating its effectiveness. Code will be available at https://github.com/Coobiw/TriVQA.
Related papers
- Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos [106.5804660736763]
Video information retrieval remains a fundamental approach for accessing video content.
We build on the observation that retrieval models often favor AI-generated content in ad-hoc and image retrieval tasks.
We investigate whether similar biases emerge in the context of challenging video retrieval.
arXiv Detail & Related papers (2025-02-11T07:43:47Z) - Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search [27.0139421302102]
In industrial systems, low-quality video characteristics fall into four categories.
These low-quality videos have been largely overlooked in academic research.
We introduce the Multi-Branch Collaborative Network (MBCN) tailored for industrial video retrieval systems.
arXiv Detail & Related papers (2025-02-09T14:57:25Z) - Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment [15.529169236891532]
We introduce MSA-VQA, a Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment.
Our hierarchical framework analyzes video content at three levels: frame, segment, and video.
We propose a Prompt Semantic Supervision Module using text encoder of CLIP to ensure semantic consistency between videos and conditional prompts.
arXiv Detail & Related papers (2025-01-06T01:18:11Z) - VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception.
Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation.
We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment.
The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z) - Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs [76.15356325947731]
We introduce Q-Bench-Video, a new benchmark specifically designed to evaluate LMMs' proficiency in discerning video quality.
We collect a total of 2,378 question-answer pairs and test them on 12 open-source & 5 proprietary LMMs.
Our findings indicate that while LMMs have a foundational understanding of video quality, their performance remains incomplete and imprecise, with a notable discrepancy compared to human performance.
arXiv Detail & Related papers (2024-09-30T08:05:00Z) - Advancing Video Quality Assessment for AIGC [17.23281750562252]
We propose a novel loss function that combines mean absolute error with cross-entropy loss to mitigate inter-frame quality inconsistencies.
We also introduce the innovative S2CNet technique to retain critical content, while leveraging adversarial training to enhance the model's generalization capabilities.
arXiv Detail & Related papers (2024-09-23T10:36:22Z) - Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model [56.03592388332793]
We investigate the AIGC-VQA problem, considering both subjective and objective quality assessment perspectives.
For the subjective perspective, we construct the Large-scale Generated Video Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos.
We evaluate the perceptual quality of AIGC videos from three critical dimensions: spatial quality, temporal quality, and text-video alignment.
We propose the Unify Generated Video Quality assessment (UGVQ) model, designed to accurately evaluate the multi-dimensional quality of AIGC videos.
arXiv Detail & Related papers (2024-07-31T07:54:26Z) - CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated
by AI [1.1035305628305816]
This paper introduces AIGCBench, a pioneering comprehensive benchmark designed to evaluate a variety of video generation tasks.
A varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions.
We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models.
arXiv Detail & Related papers (2024-01-03T10:08:40Z) - Towards Explainable In-the-Wild Video Quality Assessment: A Database and
a Language-Prompted Approach [52.07084862209754]
We collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors.
Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension.
These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings.
arXiv Detail & Related papers (2023-05-22T05:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.