VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset
- URL: http://arxiv.org/abs/2407.18245v2
- Date: Thu, 05 Dec 2024 11:29:56 GMT
- Title: VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset
- Authors: Orest Kupyn, Eugene Khvedchenia, Christian Rupprecht,
- Abstract summary: We introduce method -- a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation.
Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes.
Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step.
- Score: 18.62716110331954
- License:
- Abstract: Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce \method -- a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads.
Related papers
- Synthetic Prior for Few-Shot Drivable Head Avatar Inversion [61.51887011274453]
We present SynShot, a novel method for the few-shot inversion of a drivable head avatar based on a synthetic prior.
Inspired by machine learning models trained solely on synthetic data, we propose a method that learns a prior model from a large dataset of synthetic heads.
We model the head avatar using 3D Gaussian splatting and a convolutional encoder-decoder that outputs Gaussian parameters in UV texture space.
arXiv Detail & Related papers (2025-01-12T19:01:05Z) - GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction [47.113910048252805]
High-fidelity 3D human head avatars are crucial for applications in VR/AR, digital human, and film production.
Recent advances have leveraged morphable face models to generate animated head avatars, representing varying identities and expressions.
We introduce 3D Gaussian Parametric Head Model, which employs 3D Gaussians to accurately represent the complexities of the human head.
arXiv Detail & Related papers (2024-07-21T06:03:11Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - 3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models [52.96248836582542]
We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations.
By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
arXiv Detail & Related papers (2024-03-17T06:31:16Z) - Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation [56.267877301135634]
Current full head generation methods require a large number of 3D scans or multi-view images to train the model.
We propose Head3D, a method to generate full 3D heads with limited multi-view images.
Our model achieves cost-efficient and diverse complete head generation with photo-realistic renderings and high-quality geometry representations.
arXiv Detail & Related papers (2023-03-28T11:12:26Z) - Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks.
Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z) - Learning Neural Parametric Head Models [7.679586286000453]
We propose a novel 3D morphable model for complete human heads based on hybrid neural fields.
We capture a person's identity in a canonical space as a signed distance field (SDF), and model facial expressions with a neural deformation field.
Our representation achieves high-fidelity local detail by introducing an ensemble of local fields centered around facial anchor points.
arXiv Detail & Related papers (2022-12-06T05:24:42Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z) - Methodology for Building Synthetic Datasets with Virtual Humans [1.5556923898855324]
Large datasets can be used for improved, targeted training of deep neural networks.
In particular, we make use of a 3D morphable face model for the rendering of multiple 2D images across a dataset of 100 synthetic identities.
arXiv Detail & Related papers (2020-06-21T10:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.