THEval. Evaluation Framework for Talking Head Video Generation
- URL: http://arxiv.org/abs/2511.04520v2
- Date: Fri, 07 Nov 2025 14:25:58 GMT
- Title: THEval. Evaluation Framework for Talking Head Video Generation
- Authors: Nabyl Quignon, Baptiste Chopin, Yaohui Wang, Antitza Dantcheva,
- Abstract summary: We propose a new evaluation framework comprising 8 metrics related to three dimensions (i) quality, (ii) naturalness, and (iii) synchronization.<n>In selecting the metrics, we place emphasis on efficiency, as well as alignment with human preferences.<n>Our experiments on 85,000 videos generated by 17 state-of-the-art models suggest that while many algorithms excel in lip synchronization, they face challenges with generating expressiveness and artifact-free details.
- Score: 12.808033361725707
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video generation has achieved remarkable progress, with generated videos increasingly resembling real ones. However, the rapid advance in generation has outpaced the development of adequate evaluation metrics. Currently, the assessment of talking head generation primarily relies on limited metrics, evaluating general video quality, lip synchronization, and on conducting user studies. Motivated by this, we propose a new evaluation framework comprising 8 metrics related to three dimensions (i) quality, (ii) naturalness, and (iii) synchronization. In selecting the metrics, we place emphasis on efficiency, as well as alignment with human preferences. Based on this considerations, we streamline to analyze fine-grained dynamics of head, mouth, and eyebrows, as well as face quality. Our extensive experiments on 85,000 videos generated by 17 state-of-the-art models suggest that while many algorithms excel in lip synchronization, they face challenges with generating expressiveness and artifact-free details. These videos were generated based on a novel real dataset, that we have curated, in order to mitigate bias of training data. Our proposed benchmark framework is aimed at evaluating the improvement of generative methods. Original code, dataset and leaderboards will be publicly released and regularly updated with new methods, in order to reflect progress in the field.
Related papers
- VIPER: Process-aware Evaluation for Generative Video Reasoning [64.86465792516658]
We introduce VIPER, a comprehensive benchmark spanning 16 tasks across temporal, structural, symbolic, spatial, physics, and planning reasoning.<n>Our experiments reveal that state-of-the-art video models achieve only about 20% POC@1.0 and exhibit a significant outcome-hacking.
arXiv Detail & Related papers (2025-12-31T16:31:59Z) - Q-Save: Towards Scoring and Attribution for Generated Video Evaluation [65.83319736145869]
We present Q-Save, a new benchmark dataset and model for holistic evaluation of AI-generated video (AIGV) quality.<n>The dataset contains near 10000 videos, each annotated with a scalar mean opinion score (MOS) and fine-grained attribution labels.<n>We propose a unified evaluation model that jointly performs quality scoring and attribution-based explanation.
arXiv Detail & Related papers (2025-11-24T07:00:21Z) - Video-Bench: Human-Aligned Video Generation Benchmark [26.31594706735867]
Video generation assessment is essential for ensuring that generative models produce visually realistic, high-quality videos.<n>This paper introduces Video-Bench, a comprehensive benchmark featuring a rich prompt suite and extensive evaluation dimensions.<n> Experiments on advanced models including Sora demonstrate that Video-Bench achieves superior alignment with human preferences across all dimensions.
arXiv Detail & Related papers (2025-04-07T10:32:42Z) - What Are You Doing? A Closer Look at Controllable Human Video Generation [73.89117620413724]
What Are You Doing?' is a new benchmark for evaluation of controllable image-to-video generation of humans.<n>It consists of 1,544 captioned videos that have been meticulously collected and annotated with 56 fine-grained categories.<n>We perform in-depth analyses of seven state-of-the-art models in controllable image-to-video generation.
arXiv Detail & Related papers (2025-03-06T17:59:29Z) - GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning [62.775721264492994]
GRADEO is one of the first specifically designed video evaluation models.<n>It grades AI-generated videos for explainable scores and assessments through multi-step reasoning.<n> Experiments show that our method aligns better with human evaluations than existing methods.
arXiv Detail & Related papers (2025-03-04T07:04:55Z) - A Comparative Study of Perceptual Quality Metrics for Audio-driven
Talking Head Videos [81.54357891748087]
We collect talking head videos generated from four generative methods.
We conduct controlled psychophysical experiments on visual quality, lip-audio synchronization, and head movement naturalness.
Our experiments validate consistency between model predictions and human annotations, identifying metrics that align better with human opinions than widely-used measures.
arXiv Detail & Related papers (2024-03-11T04:13:38Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - What comprises a good talking-head video generation?: A Survey and
Benchmark [40.26689818789428]
We present a benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.
We propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video.
arXiv Detail & Related papers (2020-05-07T01:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.