Related papers: Toward Human Understanding with Controllable Synthesis

Toward Human Understanding with Controllable Synthesis

URL: http://arxiv.org/abs/2411.08663v1
Date: Wed, 13 Nov 2024 14:54:47 GMT
Title: Toward Human Understanding with Controllable Synthesis
Authors: Hanz Cuevas-Velasquez, Priyanka Patel, Haiwen Feng, Michael Black,
Abstract summary: Training methods to perform robust 3D human pose and shape estimation require diverse training images with accurate ground truth. While BEDLAM demonstrates the potential of traditional procedural graphics to generate such data, the training images are clearly synthetic. In contrast, generative image models produce highly realistic images but without ground truth.
Score: 3.6579002555961915
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training methods to perform robust 3D human pose and shape (HPS) estimation requires diverse training images with accurate ground truth. While BEDLAM demonstrates the potential of traditional procedural graphics to generate such data, the training images are clearly synthetic. In contrast, generative image models produce highly realistic images but without ground truth. Putting these methods together seems straightforward: use a generative model with the body ground truth as controlling signal. However, we find that, the more realistic the generated images, the more they deviate from the ground truth, making them inappropriate for training and evaluation. Enhancements of realistic details, such as clothing and facial expressions, can lead to subtle yet significant deviations from the ground truth, potentially misleading training models. We empirically verify that this misalignment causes the accuracy of HPS networks to decline when trained with generated images. To address this, we design a controllable synthesis method that effectively balances image realism with precise ground truth. We use this to create the Generative BEDLAM (Gen-B) dataset, which improves the realism of the existing synthetic BEDLAM dataset while preserving ground truth accuracy. We perform extensive experiments, with various noise-conditioning strategies, to evaluate the tradeoff between visual realism and HPS accuracy. We show, for the first time, that generative image models can be controlled by traditional graphics methods to produce training data that increases the accuracy of HPS methods.

Related papers

Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models. In this paper, we investigate how detection performance varies across model backbones, types, and datasets. We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
VR-based generation of photorealistic synthetic data for training hand-object tracking models [0.0]
"blender-hoisynth" is an interactive synthetic data generator based on the Blender software. It is possible for users to interact with objects via virtual hands using standard Virtual Reality hardware. We replace large parts of the training data in the well-known DexYCB dataset with hoisynth data and train a state-of-the-art HOI reconstruction model with it.
arXiv Detail & Related papers (2024-01-31T14:32:56Z)
PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning [31.81199165450692]
We present a new generation of interactive environments for representation learning research that offer both controllability and realism. We use the Unreal Engine, a powerful game engine well known in the entertainment industry, to produce PUG environments and datasets for representation learning.
arXiv Detail & Related papers (2023-08-08T01:33:13Z)
BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion [52.11972919802401]
We show that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape estimation from real images. Previous synthetic datasets have been small, unrealistic, or lacked realistic clothing.
arXiv Detail & Related papers (2023-06-29T13:35:16Z)
TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation [55.94900327396771]
We introduce neural texture learning for 6D object pose estimation from synthetic data. We learn to predict realistic texture of objects from real image collections. We learn pose estimation from pixel-perfect synthetic data.
arXiv Detail & Related papers (2022-12-25T13:36:32Z)
Synthetic Image Data for Deep Learning [0.294944680995069]
Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. We show how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle.
arXiv Detail & Related papers (2022-12-12T20:28:13Z)
Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties. Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images [10.03187850132035]
We propose an algorithm that matches a non-photorealistic, synthetically generated image to a latent vector of a pretrained StyleGAN2 model. In contrast to most previous work, we require no synthetic training data. This is the first algorithm of its kind to work at a resolution of 1K and represents a significant leap forward in visual realism.
arXiv Detail & Related papers (2020-06-26T15:00:04Z)
Syn2Real Transfer Learning for Image Deraining using Gaussian Processes [92.15895515035795]
CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality. Due to challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data. We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset.
arXiv Detail & Related papers (2020-06-10T00:33:18Z)
Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement [78.58603635621591]
Training an unpaired synthetic-to-real translation network in image space is severely under-constrained. We propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image. Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets.
arXiv Detail & Related papers (2020-03-27T21:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.