Towards Purely Unsupervised Disentanglement of Appearance and Shape for
Person Images Generation
- URL: http://arxiv.org/abs/2007.13098v2
- Date: Thu, 30 Jul 2020 00:49:59 GMT
- Title: Towards Purely Unsupervised Disentanglement of Appearance and Shape for
Person Images Generation
- Authors: Hongtao Yang, Tong Zhang, Wenbing Huang, Xuming He, Fatih Porikli
- Abstract summary: We formulate an encoder-decoder-like network to extract shape and appearance features from input images at the same time.
We train the parameters by three losses: feature adversarial loss, color consistency loss and reconstruction loss.
Experimental results on DeepFashion and Market1501 demonstrate that the proposed method achieves clean disentanglement.
- Score: 88.03260155937407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There have been a fairly of research interests in exploring the
disentanglement of appearance and shape from human images. Most existing
endeavours pursuit this goal by either using training images with annotations
or regulating the training process with external clues such as human skeleton,
body segmentation or cloth patches etc. In this paper, we aim to address this
challenge in a more unsupervised manner---we do not require any annotation nor
any external task-specific clues. To this end, we formulate an
encoder-decoder-like network to extract both the shape and appearance features
from input images at the same time, and train the parameters by three losses:
feature adversarial loss, color consistency loss and reconstruction loss. The
feature adversarial loss mainly impose little to none mutual information
between the extracted shape and appearance features, while the color
consistency loss is to encourage the invariance of person appearance
conditioned on different shapes. More importantly, our unsupervised
(Unsupervised learning has many interpretations in different tasks. To be
clear, in this paper, we refer unsupervised learning as learning without
task-specific human annotations, pairs or any form of weak supervision.)
framework utilizes learned shape features as masks which are applied to the
input itself in order to obtain clean appearance features. Without using fixed
input human skeleton, our network better preserves the conditional human
posture while requiring less supervision. Experimental results on DeepFashion
and Market1501 demonstrate that the proposed method achieves clean
disentanglement and is able to synthesis novel images of comparable quality
with state-of-the-art weakly-supervised or even supervised methods.
Related papers
- Understanding Pose and Appearance Disentanglement in 3D Human Pose
Estimation [72.50214227616728]
Several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one.
We study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments.
We design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust.
arXiv Detail & Related papers (2023-09-20T22:22:21Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - Self-Supervised Learning for Place Representation Generalization across
Appearance Changes [11.030196234282675]
We investigate learning features that are robust to appearance modifications while sensitive to geometric transformations in a self-supervised manner.
Our results reveal that jointly learning appearance-robust and geometry-sensitive image descriptors leads to competitive visual place recognition results.
arXiv Detail & Related papers (2023-03-04T10:14:47Z) - Occluded Person Re-Identification via Relational Adaptive Feature
Correction Learning [8.015703163954639]
Occluded person re-identification (Re-ID) in images captured by multiple cameras is challenging because the target person is occluded by pedestrians or objects.
Most existing methods utilize the off-the-shelf pose or parsing networks as pseudo labels, which are prone to error.
We propose a novel Occlusion Correction Network (OCNet) that corrects features through relational-weight learning and obtains diverse and representative features without using external networks.
arXiv Detail & Related papers (2022-12-09T07:48:47Z) - Fully Unsupervised Person Re-identification viaSelective Contrastive
Learning [58.5284246878277]
Person re-identification (ReID) aims at searching the same identity person among images captured by various cameras.
We propose a novel selective contrastive learning framework for unsupervised feature learning.
Experimental results demonstrate the superiority of our method in unsupervised person ReID compared with the state-of-the-arts.
arXiv Detail & Related papers (2020-10-15T09:09:23Z) - Unsupervised Deep Metric Learning with Transformed Attention Consistency
and Contrastive Clustering Loss [28.17607283348278]
Existing approaches for unsupervised metric learning focus on exploring self-supervision information within the input image itself.
We observe that, when analyzing images, human eyes often compare images against each other instead of examining images individually.
We develop a new approach to unsupervised deep metric learning where the network is learned based on self-supervision information across images.
arXiv Detail & Related papers (2020-08-10T19:33:47Z) - Unsupervised Landmark Learning from Unpaired Data [117.81440795184587]
Recent attempts for unsupervised landmark learning leverage synthesized image pairs that are similar in appearance but different in poses.
We propose a cross-image cycle consistency framework which applies the swapping-reconstruction strategy twice to obtain the final supervision.
Our proposed framework is shown to outperform strong baselines by a large margin.
arXiv Detail & Related papers (2020-06-29T13:57:20Z) - Face Identity Disentanglement via Latent Space Mapping [47.27253184341152]
We present a method that learns how to represent data in a disentangled way, with minimal supervision.
Our key insight is to decouple the processes of disentanglement and synthesis, by employing a leading pre-trained unconditional image generator, such as StyleGAN.
We show that our method successfully disentangles identity from other facial attributes, surpassing existing methods.
arXiv Detail & Related papers (2020-05-15T18:24:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.