LMM4Gen3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs
- URL: http://arxiv.org/abs/2504.20466v1
- Date: Tue, 29 Apr 2025 07:00:06 GMT
- Title: LMM4Gen3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs
- Authors: Woo Yi Yang, Jiarui Wang, Sijing Wu, Huiyu Duan, Yuxin Zhu, Liu Yang, Kang Fu, Guangtao Zhai, Xiongkuo Min,
- Abstract summary: We propose LMME3DHF as a metric for evaluating 3DHF capable of quality and authenticity score prediction, distortion-aware visual question answering, and distortion-aware saliency prediction.<n> Experimental results show that LMME3DHF achieves state-of-the-art performance, surpassing existing methods in both accurately predicting quality scores for AI-generated 3D human faces.
- Score: 48.534851709853534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement in generative artificial intelligence have enabled the creation of 3D human faces (HFs) for applications including media production, virtual reality, security, healthcare, and game development, etc. However, assessing the quality and realism of these AI-generated 3D human faces remains a significant challenge due to the subjective nature of human perception and innate perceptual sensitivity to facial features. To this end, we conduct a comprehensive study on the quality assessment of AI-generated 3D human faces. We first introduce Gen3DHF, a large-scale benchmark comprising 2,000 videos of AI-Generated 3D Human Faces along with 4,000 Mean Opinion Scores (MOS) collected across two dimensions, i.e., quality and authenticity, 2,000 distortion-aware saliency maps and distortion descriptions. Based on Gen3DHF, we propose LMME3DHF, a Large Multimodal Model (LMM)-based metric for Evaluating 3DHF capable of quality and authenticity score prediction, distortion-aware visual question answering, and distortion-aware saliency prediction. Experimental results show that LMME3DHF achieves state-of-the-art performance, surpassing existing methods in both accurately predicting quality scores for AI-generated 3D human faces and effectively identifying distortion-aware salient regions and distortion types, while maintaining strong alignment with human perceptual judgments. Both the Gen3DHF database and the LMME3DHF will be released upon the publication.
Related papers
- SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets [72.26350984924129]
We propose a latent space generation paradigm for 3D human digitization.
We transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift.
We employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset.
arXiv Detail & Related papers (2025-04-09T15:38:18Z) - 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models [94.48803082248872]
3D generation is experiencing rapid advancements, while the development of 3D evaluation has not kept pace.<n>We develop a large-scale human preference dataset 3DGen-Bench.<n>We then train a CLIP-based scoring model, 3DGen-Score, and a MLLM-based automatic evaluator, 3DGen-Eval.
arXiv Detail & Related papers (2025-03-27T17:53:00Z) - GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior [25.72805054203982]
We propose a two-stage framework for generating identity-preserving realistic 3D humans from text and image prompts.
Our core insight is to leverage human-centric knowledge to facilitate the generation process.
Experiments demonstrate that GaussianIP outperforms existing methods in both visual quality and training efficiency.
arXiv Detail & Related papers (2025-03-14T07:16:43Z) - GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data [61.05815629606135]
Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model.<n>GeneMAN builds upon a comprehensive collection of high-quality human data.<n>GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods.
arXiv Detail & Related papers (2024-11-27T18:59:54Z) - InceptionHuman: Controllable Prompt-to-NeRF for Photorealistic 3D Human Generation [61.62346472443454]
InceptionHuman is a prompt-to-NeRF framework that allows easy control via a combination of prompts in different modalities to generate photorealistic 3D humans.
InceptionHuman achieves consistent 3D human generation within a progressively refined NeRF space.
arXiv Detail & Related papers (2023-11-27T15:49:41Z) - A No-Reference Quality Assessment Method for Digital Human Head [56.17852258306602]
We develop a novel no-reference (NR) method based on Transformer to deal with digital human quality assessment (DHQA)
Specifically, the front 2D projections of the digital humans are rendered as inputs and the vision transformer (ViT) is employed for the feature extraction.
Then we design a multi-task module to jointly classify the distortion types and predict the perceptual quality levels of digital humans.
arXiv Detail & Related papers (2023-10-25T16:01:05Z) - Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using
Pixel-aligned Reconstruction Priors [56.192682114114724]
Get3DHuman is a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes.
Our key observation is that the 3D generator can profit from human-related priors learned through 2D human generators and 3D reconstructors.
arXiv Detail & Related papers (2023-02-02T15:37:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.