Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D
Shape, Pose, and Appearance Consistency
- URL: http://arxiv.org/abs/2110.05458v1
- Date: Mon, 11 Oct 2021 17:48:50 GMT
- Title: Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D
Shape, Pose, and Appearance Consistency
- Authors: Soubhik Sanyal and Alex Vorobiov and Timo Bolkart and Matthew Loper
and Betty Mohler and Larry Davis and Javier Romero and Michael J. Black
- Abstract summary: We propose a self-supervised framework named SPICE that closes the image quality gap with supervised methods.
The key insight enabling self-supervision is to exploit 3D information about the human body in several ways.
SPICE achieves state-of-the-art performance on the DeepFashion dataset.
- Score: 55.94908688207493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthesizing images of a person in novel poses from a single image is a
highly ambiguous task. Most existing approaches require paired training images;
i.e. images of the same person with the same clothing in different poses.
However, obtaining sufficiently large datasets with paired data is challenging
and costly. Previous methods that forego paired supervision lack realism. We
propose a self-supervised framework named SPICE (Self-supervised Person Image
CrEation) that closes the image quality gap with supervised methods. The key
insight enabling self-supervision is to exploit 3D information about the human
body in several ways. First, the 3D body shape must remain unchanged when
reposing. Second, representing body pose in 3D enables reasoning about self
occlusions. Third, 3D body parts that are visible before and after reposing,
should have similar appearance features. Once trained, SPICE takes an image of
a person and generates a new image of that person in a new target pose. SPICE
achieves state-of-the-art performance on the DeepFashion dataset, improving the
FID score from 29.9 to 7.8 compared with previous unsupervised methods, and
with performance similar to the state-of-the-art supervised method (6.4). SPICE
also generates temporally coherent videos given an input image and a sequence
of poses, despite being trained on static images only.
Related papers
- PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation [38.958695275774616]
We introduce a new transformer-based model, trained in a retrieval fashion, which can take as input any combination of the aforementioned modalities.
We showcase the potential of such an embroidered pose representation for (1) SMPL regression from image with optional text cue; and (2) on the task of fine-grained instruction generation.
arXiv Detail & Related papers (2024-09-10T14:09:39Z) - Synthesizing Moving People with 3D Control [88.68284137105654]
We present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence.
For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image.
Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses.
arXiv Detail & Related papers (2024-01-19T18:59:11Z) - Understanding Pose and Appearance Disentanglement in 3D Human Pose
Estimation [72.50214227616728]
Several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one.
We study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments.
We design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust.
arXiv Detail & Related papers (2023-09-20T22:22:21Z) - Self-supervised 3D Human Pose Estimation from a Single Image [1.0878040851638]
We propose a new self-supervised method for predicting 3D human body pose from a single image.
The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses.
arXiv Detail & Related papers (2023-04-05T10:26:21Z) - Single-view 3D Body and Cloth Reconstruction under Complex Poses [37.86174829271747]
We extend existing implicit function-based models to deal with images of humans with arbitrary poses and self-occluded limbs.
We learn an implicit function that maps the input image to a 3D body shape with a low level of detail.
We then learn a displacement map, conditioned on the smoothed surface, which encodes the high-frequency details of the clothes and body.
arXiv Detail & Related papers (2022-05-09T07:34:06Z) - Neural 3D Clothes Retargeting from a Single Image [91.5030622330039]
We present a method of clothes; generating the potential poses and deformations of a given 3D clothing template model to fit onto a person in a single RGB image.
The problem is fundamentally ill-posed as attaining the ground truth data is impossible, i.e. images of people wearing the different 3D clothing template model model at exact same pose.
We propose a semi-supervised learning framework that validates the physical plausibility of 3D deformation by matching with the prescribed body-to-cloth contact points and clothing to fit onto the unlabeled silhouette.
arXiv Detail & Related papers (2021-01-29T20:50:34Z) - Unsupervised 3D Human Pose Representation with Viewpoint and Pose
Disentanglement [63.853412753242615]
Learning a good 3D human pose representation is important for human pose related tasks.
We propose a novel Siamese denoising autoencoder to learn a 3D pose representation.
Our approach achieves state-of-the-art performance on two inherently different tasks.
arXiv Detail & Related papers (2020-07-14T14:25:22Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.