What comprises a good talking-head video generation?: A Survey and
Benchmark
- URL: http://arxiv.org/abs/2005.03201v1
- Date: Thu, 7 May 2020 01:58:05 GMT
- Title: What comprises a good talking-head video generation?: A Survey and
Benchmark
- Authors: Lele Chen, Guofeng Cui, Ziyi Kou, Haitian Zheng, Chenliang Xu
- Abstract summary: We present a benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.
We propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video.
- Score: 40.26689818789428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the years, performance evaluation has become essential in computer
vision, enabling tangible progress in many sub-fields. While talking-head video
generation has become an emerging research topic, existing evaluations on this
topic present many limitations. For example, most approaches use human subjects
(e.g., via Amazon MTurk) to evaluate their research claims directly. This
subjective evaluation is cumbersome, unreproducible, and may impend the
evolution of new research. In this work, we present a carefully-designed
benchmark for evaluating talking-head video generation with standardized
dataset pre-processing strategies. As for evaluation, we either propose new
metrics or select the most appropriate ones to evaluate results in what we
consider as desired properties for a good talking-head video, namely, identity
preserving, lip synchronization, high video quality, and natural-spontaneous
motion. By conducting a thoughtful analysis across several state-of-the-art
talking-head generation approaches, we aim to uncover the merits and drawbacks
of current methods and point out promising directions for future work. All the
evaluation code is available at:
https://github.com/lelechen63/talking-head-generation-survey.
Related papers
- A Comparative Study of Perceptual Quality Metrics for Audio-driven
Talking Head Videos [81.54357891748087]
We collect talking head videos generated from four generative methods.
We conduct controlled psychophysical experiments on visual quality, lip-audio synchronization, and head movement naturalness.
Our experiments validate consistency between model predictions and human annotations, identifying metrics that align better with human opinions than widely-used measures.
arXiv Detail & Related papers (2024-03-11T04:13:38Z) - VBench: Comprehensive Benchmark Suite for Video Generative Models [100.43756570261384]
VBench is a benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions.
We provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception.
We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations.
arXiv Detail & Related papers (2023-11-29T18:39:01Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - From Pixels to Portraits: A Comprehensive Survey of Talking Head
Generation Techniques and Applications [3.8301843990331887]
Recent advancements in deep learning and computer vision have led to a surge of interest in generating realistic talking heads.
We systematically categorise them into four main approaches: image-driven, audio-driven, video-driven and others.
We provide an in-depth analysis of each method, highlighting their unique contributions, strengths, and limitations.
arXiv Detail & Related papers (2023-08-30T14:00:48Z) - Learning and Evaluating Human Preferences for Conversational Head
Generation [101.89332968344102]
We propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions.
PS can serve as a quantitative evaluation without the need for human annotation.
arXiv Detail & Related papers (2023-07-20T07:04:16Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.