Toward Human Understanding with Controllable Synthesis
- URL: http://arxiv.org/abs/2411.08663v1
- Date: Wed, 13 Nov 2024 14:54:47 GMT
- Title: Toward Human Understanding with Controllable Synthesis
- Authors: Hanz Cuevas-Velasquez, Priyanka Patel, Haiwen Feng, Michael Black,
- Abstract summary: Training methods to perform robust 3D human pose and shape estimation require diverse training images with accurate ground truth.
While BEDLAM demonstrates the potential of traditional procedural graphics to generate such data, the training images are clearly synthetic.
In contrast, generative image models produce highly realistic images but without ground truth.
- Score: 3.6579002555961915
- License:
- Abstract: Training methods to perform robust 3D human pose and shape (HPS) estimation requires diverse training images with accurate ground truth. While BEDLAM demonstrates the potential of traditional procedural graphics to generate such data, the training images are clearly synthetic. In contrast, generative image models produce highly realistic images but without ground truth. Putting these methods together seems straightforward: use a generative model with the body ground truth as controlling signal. However, we find that, the more realistic the generated images, the more they deviate from the ground truth, making them inappropriate for training and evaluation. Enhancements of realistic details, such as clothing and facial expressions, can lead to subtle yet significant deviations from the ground truth, potentially misleading training models. We empirically verify that this misalignment causes the accuracy of HPS networks to decline when trained with generated images. To address this, we design a controllable synthesis method that effectively balances image realism with precise ground truth. We use this to create the Generative BEDLAM (Gen-B) dataset, which improves the realism of the existing synthetic BEDLAM dataset while preserving ground truth accuracy. We perform extensive experiments, with various noise-conditioning strategies, to evaluate the tradeoff between visual realism and HPS accuracy. We show, for the first time, that generative image models can be controlled by traditional graphics methods to produce training data that increases the accuracy of HPS methods.
Related papers
- Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.
In this paper, we investigate how detection performance varies across model backbones, types, and datasets.
We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - VR-based generation of photorealistic synthetic data for training
hand-object tracking models [0.0]
"blender-hoisynth" is an interactive synthetic data generator based on the Blender software.
It is possible for users to interact with objects via virtual hands using standard Virtual Reality hardware.
We replace large parts of the training data in the well-known DexYCB dataset with hoisynth data and train a state-of-the-art HOI reconstruction model with it.
arXiv Detail & Related papers (2024-01-31T14:32:56Z) - PUG: Photorealistic and Semantically Controllable Synthetic Data for
Representation Learning [31.81199165450692]
We present a new generation of interactive environments for representation learning research that offer both controllability and realism.
We use the Unreal Engine, a powerful game engine well known in the entertainment industry, to produce PUG environments and datasets for representation learning.
arXiv Detail & Related papers (2023-08-08T01:33:13Z) - BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike
Animated Motion [52.11972919802401]
We show that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape estimation from real images.
Previous synthetic datasets have been small, unrealistic, or lacked realistic clothing.
arXiv Detail & Related papers (2023-06-29T13:35:16Z) - TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose
Estimation [55.94900327396771]
We introduce neural texture learning for 6D object pose estimation from synthetic data.
We learn to predict realistic texture of objects from real image collections.
We learn pose estimation from pixel-perfect synthetic data.
arXiv Detail & Related papers (2022-12-25T13:36:32Z) - Synthetic Image Data for Deep Learning [0.294944680995069]
Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models.
We show how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle.
arXiv Detail & Related papers (2022-12-12T20:28:13Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered
Face Images [10.03187850132035]
We propose an algorithm that matches a non-photorealistic, synthetically generated image to a latent vector of a pretrained StyleGAN2 model.
In contrast to most previous work, we require no synthetic training data.
This is the first algorithm of its kind to work at a resolution of 1K and represents a significant leap forward in visual realism.
arXiv Detail & Related papers (2020-06-26T15:00:04Z) - Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement [78.58603635621591]
Training an unpaired synthetic-to-real translation network in image space is severely under-constrained.
We propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image.
Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets.
arXiv Detail & Related papers (2020-03-27T21:45:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.