BodyMetric: Evaluating the Realism of Human Bodies in Text-to-Image Generation
- URL: http://arxiv.org/abs/2412.04086v2
- Date: Fri, 06 Dec 2024 09:00:39 GMT
- Title: BodyMetric: Evaluating the Realism of Human Bodies in Text-to-Image Generation
- Authors: Nefeli Andreou, Varsha Vivek, Ying Wang, Alex Vorobiov, Tiffany Deng, Raja Bala, Larry Davis, Betty Mohler Tesch,
- Abstract summary: BodyMetric is a learnable metric that predicts body realism in images.
We demonstrate BodyMetric through applications that were previously infeasible at scale.
- Score: 9.85749440360125
- License:
- Abstract: Accurately generating images of human bodies from text remains a challenging problem for state of the art text-to-image models. Commonly observed body-related artifacts include extra or missing limbs, unrealistic poses, blurred body parts, etc. Currently, evaluation of such artifacts relies heavily on time-consuming human judgments, limiting the ability to benchmark models at scale. We address this by proposing BodyMetric, a learnable metric that predicts body realism in images. BodyMetric is trained on realism labels and multi-modal signals including 3D body representations inferred from the input image, and textual descriptions. In order to facilitate this approach, we design an annotation pipeline to collect expert ratings on human body realism leading to a new dataset for this task, namely, BodyRealism. Ablation studies support our architectural choices for BodyMetric and the importance of leveraging a 3D human body prior in capturing body-related artifacts in 2D images. In comparison to concurrent metrics which evaluate general user preference in images, BodyMetric specifically reflects body-related artifacts. We demonstrate the utility of BodyMetric through applications that were previously infeasible at scale. In particular, we use BodyMetric to benchmark the generation ability of text-to-image models to produce realistic human bodies. We also demonstrate the effectiveness of BodyMetric in ranking generated images based on the predicted realism scores.
Related papers
- DiffBody: Diffusion-based Pose and Shape Editing of Human Images [1.7188280334580193]
We propose a one-shot approach that enables large edits with identity preservation.
To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and change the body's pose and shape.
We further enhance the realism by fine-tuning text embeddings via self-supervised learning.
arXiv Detail & Related papers (2024-01-05T13:36:19Z) - Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing [54.29207348918216]
Cloth2Body needs to address new and emerging challenges raised by the partial observation of the input and the high diversity of the output.
We propose an end-to-end framework that can accurately estimate 3D body mesh parameterized by pose and shape from a 2D clothing image.
As shown by experimental results, the proposed framework achieves state-of-the-art performance and can effectively recover natural and diverse 3D body meshes from 2D images.
arXiv Detail & Related papers (2023-09-28T06:18:38Z) - Procedural Humans for Computer Vision [1.9550079119934403]
We build a parametric model of the face and body, including articulated hands, to generate realistic images of humans based on this body model.
We show that this can be extended to include the full body by building on the pipeline of Wood et al. to generate synthetic images of humans in their entirety.
arXiv Detail & Related papers (2023-01-03T15:44:48Z) - Structure-Aware Flow Generation for Human Body Reshaping [15.365236395118982]
We develop an end-to-end flow generation architecture to achieve unprecedentedly controllable performance under arbitrary poses and garments.
For a comprehensive evaluation, we construct the first large-scale body reshaping dataset, namely BR-5K.
Our approach significantly outperforms existing state-of-the-art methods in terms of visual performance, controllability, and efficiency.
arXiv Detail & Related papers (2022-03-09T12:22:38Z) - Automatic Estimation of Anthropometric Human Body Measurements [0.0]
This paper formulates a research in the field of deep learning and neural networks, to tackle the challenge of body measurements estimation from various types of visual input data.
Also, we deal with the lack of real human data annotated with ground truth body measurements required for training and evaluation, by generating a synthetic dataset of various human body shapes.
arXiv Detail & Related papers (2021-12-22T16:13:59Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D
Shape, Pose, and Appearance Consistency [55.94908688207493]
We propose a self-supervised framework named SPICE that closes the image quality gap with supervised methods.
The key insight enabling self-supervision is to exploit 3D information about the human body in several ways.
SPICE achieves state-of-the-art performance on the DeepFashion dataset.
arXiv Detail & Related papers (2021-10-11T17:48:50Z) - Detailed Avatar Recovery from Single Image [50.82102098057822]
This paper presents a novel framework to recover emphdetailed avatar from a single image.
We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation framework.
Our method can restore detailed human body shapes with complete textures beyond skinned models.
arXiv Detail & Related papers (2021-08-06T03:51:26Z) - 3D Human Body Reshaping with Anthropometric Modeling [59.51820187982793]
Reshaping accurate and realistic 3D human bodies from anthropometric parameters poses a fundamental challenge for person identification, online shopping and virtual reality.
Existing approaches for creating such 3D shapes often suffer from complex measurement by range cameras or high-end scanners.
This paper proposes a novel feature-selection-based local mapping technique, which enables automatic anthropometric parameter modeling for each body facet.
arXiv Detail & Related papers (2021-04-05T04:09:39Z) - Liquid Warping GAN with Attention: A Unified Framework for Human Image
Synthesis [58.05389586712485]
We tackle human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis.
In this paper, we propose a 3D body mesh recovery module to disentangle the pose and shape.
We also build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis.
arXiv Detail & Related papers (2020-11-18T02:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.