A Comparative Study of Perceptual Quality Metrics for Audio-driven
Talking Head Videos
- URL: http://arxiv.org/abs/2403.06421v1
- Date: Mon, 11 Mar 2024 04:13:38 GMT
- Title: A Comparative Study of Perceptual Quality Metrics for Audio-driven
Talking Head Videos
- Authors: Weixia Zhang and Chengguang Zhu and Jingnan Gao and Yichao Yan and
Guangtao Zhai and Xiaokang Yang
- Abstract summary: We collect talking head videos generated from four generative methods.
We conduct controlled psychophysical experiments on visual quality, lip-audio synchronization, and head movement naturalness.
Our experiments validate consistency between model predictions and human annotations, identifying metrics that align better with human opinions than widely-used measures.
- Score: 81.54357891748087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of Artificial Intelligence Generated Content (AIGC)
technology has propelled audio-driven talking head generation, gaining
considerable research attention for practical applications. However,
performance evaluation research lags behind the development of talking head
generation techniques. Existing literature relies on heuristic quantitative
metrics without human validation, hindering accurate progress assessment. To
address this gap, we collect talking head videos generated from four generative
methods and conduct controlled psychophysical experiments on visual quality,
lip-audio synchronization, and head movement naturalness. Our experiments
validate consistency between model predictions and human annotations,
identifying metrics that align better with human opinions than widely-used
measures. We believe our work will facilitate performance evaluation and model
development, providing insights into AIGC in a broader context. Code and data
will be made available at https://github.com/zwx8981/ADTH-QA.
Related papers
- THQA: A Perceptual Quality Assessment Database for Talking Heads [56.42738564463101]
Speech-driven methods offer a novel avenue for manipulating the mouth shape and expressions of digital humans.
Despite the proliferation of driving methods, the quality of many generated talking head (TH) videos remains a concern.
This paper introduces the Talking Head Quality Assessment (THQA) database, featuring 800 TH videos generated through 8 diverse speech-driven methods.
arXiv Detail & Related papers (2024-04-13T13:08:57Z) - AIR-Bench: Benchmarking Large Audio-Language Models via Generative
Comprehension [98.69691822391069]
We introduce AIR-Bench, the first benchmark to evaluate the ability of Large Audio-Language Models (LALMs) to understand various types of audio signals and interact with humans in the textual format.
Results demonstrate a high level of consistency between GPT-4-based evaluation and human evaluation.
arXiv Detail & Related papers (2024-02-12T15:41:22Z) - FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [85.16273912625022]
We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from audio signal.
To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of human heads.
arXiv Detail & Related papers (2023-12-13T19:01:07Z) - Learning and Evaluating Human Preferences for Conversational Head
Generation [101.89332968344102]
We propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions.
PS can serve as a quantitative evaluation without the need for human annotation.
arXiv Detail & Related papers (2023-07-20T07:04:16Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Exploration of Audio Quality Assessment and Anomaly Localisation Using
Attention Models [37.60722440434528]
In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism.
The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features.
To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds.
arXiv Detail & Related papers (2020-05-16T17:54:07Z) - What comprises a good talking-head video generation?: A Survey and
Benchmark [40.26689818789428]
We present a benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.
We propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video.
arXiv Detail & Related papers (2020-05-07T01:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.