THQA: A Perceptual Quality Assessment Database for Talking Heads
- URL: http://arxiv.org/abs/2404.09003v1
- Date: Sat, 13 Apr 2024 13:08:57 GMT
- Title: THQA: A Perceptual Quality Assessment Database for Talking Heads
- Authors: Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiaohong Liu, Xiongkuo Min, Zhihua Wang, Xiao-Ping Zhang, Guangtao Zhai,
- Abstract summary: Speech-driven methods offer a novel avenue for manipulating the mouth shape and expressions of digital humans.
Despite the proliferation of driving methods, the quality of many generated talking head (TH) videos remains a concern.
This paper introduces the Talking Head Quality Assessment (THQA) database, featuring 800 TH videos generated through 8 diverse speech-driven methods.
- Score: 56.42738564463101
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of media technology, digital humans have gained prominence due to rapid advancements in computer technology. However, the manual modeling and control required for the majority of digital humans pose significant obstacles to efficient development. The speech-driven methods offer a novel avenue for manipulating the mouth shape and expressions of digital humans. Despite the proliferation of driving methods, the quality of many generated talking head (TH) videos remains a concern, impacting user visual experiences. To tackle this issue, this paper introduces the Talking Head Quality Assessment (THQA) database, featuring 800 TH videos generated through 8 diverse speech-driven methods. Extensive experiments affirm the THQA database's richness in character and speech features. Subsequent subjective quality assessment experiments analyze correlations between scoring results and speech-driven methods, ages, and genders. In addition, experimental results show that mainstream image and video quality assessment methods have limitations for the THQA database, underscoring the imperative for further research to enhance TH video quality assessment. The THQA database is publicly accessible at https://github.com/zyj-2000/THQA.
Related papers
- A Comparative Study of Perceptual Quality Metrics for Audio-driven
Talking Head Videos [81.54357891748087]
We collect talking head videos generated from four generative methods.
We conduct controlled psychophysical experiments on visual quality, lip-audio synchronization, and head movement naturalness.
Our experiments validate consistency between model predictions and human annotations, identifying metrics that align better with human opinions than widely-used measures.
arXiv Detail & Related papers (2024-03-11T04:13:38Z) - A No-Reference Quality Assessment Method for Digital Human Head [56.17852258306602]
We develop a novel no-reference (NR) method based on Transformer to deal with digital human quality assessment (DHQA)
Specifically, the front 2D projections of the digital humans are rendered as inputs and the vision transformer (ViT) is employed for the feature extraction.
Then we design a multi-task module to jointly classify the distortion types and predict the perceptual quality levels of digital humans.
arXiv Detail & Related papers (2023-10-25T16:01:05Z) - Advancing Zero-Shot Digital Human Quality Assessment through
Text-Prompted Evaluation [60.873105678086404]
SJTU-H3D is a subjective quality assessment database specifically designed for full-body digital humans.
It comprises 40 high-quality reference digital humans and 1,120 labeled distorted counterparts generated with seven types of distortions.
arXiv Detail & Related papers (2023-07-06T06:55:30Z) - Audio-Visual Quality Assessment for User Generated Content: Database and
Method [61.970768267688086]
Most existing VQA studies only focus on the visual distortions of videos, ignoring that the user's QoE also depends on the accompanying audio signals.
We construct the first AVQA database named the SJTU-UAV database, which includes 520 in-the-wild audio and video (A/V) sequences.
We also design a family of AVQA models, which fuse the popular VQA methods and audio features via support vector regressor (SVR)
The experimental results show that with the help of audio signals, the VQA models can evaluate the quality more accurately.
arXiv Detail & Related papers (2023-03-04T11:49:42Z) - DDH-QA: A Dynamic Digital Humans Quality Assessment Database [55.69700918818879]
We construct a large-scale dynamic digital human quality assessment database with diverse motion content as well as multiple distortions.
Ten types of common motion are employed to drive the DDHs and a total of 800 DDHs are generated in the end.
arXiv Detail & Related papers (2022-12-24T13:35:31Z) - Perceptual Quality Assessment for Digital Human Heads [35.801468849447126]
We propose the first large-scale quality assessment database for 3D scanned digital human heads (DHHs)
The constructed database consists of 55 reference DHHs and 1,540 distorted DHHs along with the subjective perceptual ratings.
The experimental results reveal that the proposed method exhibits state-of-the-art performance among the mainstream FR metrics.
arXiv Detail & Related papers (2022-09-20T06:02:57Z) - What comprises a good talking-head video generation?: A Survey and
Benchmark [40.26689818789428]
We present a benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.
We propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video.
arXiv Detail & Related papers (2020-05-07T01:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.