Measuring the Quality of Text-to-Video Model Outputs: Metrics and
Dataset
- URL: http://arxiv.org/abs/2309.08009v1
- Date: Thu, 14 Sep 2023 19:35:53 GMT
- Title: Measuring the Quality of Text-to-Video Model Outputs: Metrics and
Dataset
- Authors: Iya Chivileva and Philip Lynch and Tomas E. Ward and Alan F. Smeaton
- Abstract summary: The paper presents a dataset of more than 1,000 generated videos from 5 very recent T2V models on which some of those commonly used quality metrics are applied.
We also include extensive human quality evaluations on those videos, allowing the relative strengths and weaknesses of metrics, including human assessment, to be compared.
Our conclusion is that naturalness and semantic matching with the text prompt used to generate the T2V output are important but there is no single measure to capture these subtleties in assessing T2V model output.
- Score: 1.9685736810241874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Evaluating the quality of videos generated from text-to-video (T2V) models is
important if they are to produce plausible outputs that convince a viewer of
their authenticity. We examine some of the metrics used in this area and
highlight their limitations. The paper presents a dataset of more than 1,000
generated videos from 5 very recent T2V models on which some of those commonly
used quality metrics are applied. We also include extensive human quality
evaluations on those videos, allowing the relative strengths and weaknesses of
metrics, including human assessment, to be compared. The contribution is an
assessment of commonly used quality metrics, and a comparison of their
performances and the performance of human evaluations on an open dataset of T2V
videos. Our conclusion is that naturalness and semantic matching with the text
prompt used to generate the T2V output are important but there is no single
measure to capture these subtleties in assessing T2V model output.
Related papers
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation [55.57459883629706]
We conduct the first systematic study on compositional text-to-video generation.
We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment [54.00254267259069]
We establish the largest-scale Text-to-Video Quality Assessment DataBase (T2VQA-DB) to date.
The dataset is composed of 10,000 videos generated by 9 different T2V models.
We propose a novel transformer-based model for subjective-aligned Text-to-Video Quality Assessment (T2VQA)
arXiv Detail & Related papers (2024-03-18T16:52:49Z) - Towards A Better Metric for Text-to-Video Generation [102.16250512265995]
Generative models have demonstrated remarkable capability in synthesizing high-quality text, images, and videos.
We introduce a novel evaluation pipeline, the Text-to-Video Score (T2VScore)
This metric integrates two pivotal criteria: (1) Text-Video Alignment, which scrutinizes the fidelity of the video in representing the given text description, and (2) Video Quality, which evaluates the video's overall production caliber with a mixture of experts.
arXiv Detail & Related papers (2024-01-15T15:42:39Z) - FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain
Text-to-Video Generation [27.620973815397296]
Open-domain text-to-video (T2V) generation models have made remarkable progress.
Existing studies lack fine-grained evaluation of T2V models on different categories of text prompts.
It is unclear whether the automatic evaluation metrics are consistent with human standards.
arXiv Detail & Related papers (2023-11-03T09:46:05Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - Models See Hallucinations: Evaluating the Factuality in Video Captioning [57.85548187177109]
We conduct a human evaluation of the factuality in video captioning and collect two annotated factuality datasets.
We find that 57.0% of the model-generated sentences have factual errors, indicating it is a severe problem in this field.
We propose a weakly-supervised, model-based factuality metric FactVC, which outperforms previous metrics on factuality evaluation of video captioning.
arXiv Detail & Related papers (2023-03-06T08:32:50Z) - Video compression dataset and benchmark of learning-based video-quality
metrics [55.41644538483948]
We present a new benchmark for video-quality metrics that evaluates video compression.
It is based on a new dataset consisting of about 2,500 streams encoded using different standards.
Subjective scores were collected using crowdsourced pairwise comparisons.
arXiv Detail & Related papers (2022-11-22T09:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.