T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency
and Manifold Mix-Up
- URL: http://arxiv.org/abs/2208.12752v3
- Date: Sun, 2 Jul 2023 10:23:51 GMT
- Title: T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency
and Manifold Mix-Up
- Authors: Deyin Liu, Lin Yuanbo Wu, Bo Li, Zongyuan Ge
- Abstract summary: We present an end-to-end approach to generate high-resolution person images conditioned on texts only.
We develop an effective generative model to produce person images with two novel mechanisms.
- Score: 16.165889084870116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present an end-to-end approach to generate high-resolution
person images conditioned on texts only. State-of-the-art text-to-image
generation models are mainly designed for center-object generation, e.g.,
flowers and birds. Unlike center-placed objects with similar shapes and
orientation, person image generation is a more challenging task, for which we
observe the followings: 1) the generated images for the same person exhibit
visual details with identity-consistency, e.g., identity-related
textures/clothes/shoes across the images, and 2) those images should be
discriminant for being robust against the inter-person variations caused by
visual ambiguities. To address the above challenges, we develop an effective
generative model to produce person images with two novel mechanisms. In
particular, our first mechanism (called T-Person-GAN-ID) is to integrate the
one-stream generator with an identity-preserving network such that the
representations of generated data are regularized in their feature space to
ensure the identity-consistency. The second mechanism (called
T-Person-GAN-ID-MM) is based on the manifold mix-up to produce mixed images via
the linear interpolation across generated images from different manifold
identities, and we further enforce such interpolated images to be linearly
classified in the feature space. This amounts to learning a linear
classification boundary that can perfectly separate images from two identities.
Our proposed method is empirically validated to achieve a remarkable
improvement in text-to-person image generation. Our architecture is orthogonal
to StackGAN++ , and focuses on person image generation, with all of them
together to enrich the spectrum of GANs for the image generation task. Codes
are available on
\url{https://github.com/linwu-github/Person-Image-Generation.git}.
Related papers
- Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - Generative Unlearning for Any Identity [6.872154067622779]
In certain domains related to privacy issues, advanced generative models along with strong inversion methods can lead to potential misuses.
We propose an essential yet under-explored task called generative identity unlearning, which steers the model not to generate an image of a specific identity.
We propose a novel framework, Generative Unlearning for Any Identity (GUIDE), which prevents the reconstruction of a specific identity by unlearning the generator with only a single image.
arXiv Detail & Related papers (2024-05-16T08:00:55Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition [85.94331736287765]
We formulate HFR as a dual generation problem, and tackle it via a novel Dual Variational Generation (DVG-Face) framework.
We integrate abundant identity information of large-scale visible data into the joint distribution.
Massive new diverse paired heterogeneous images with the same identity can be generated from noises.
arXiv Detail & Related papers (2020-09-20T09:48:24Z) - XingGAN for Person Image Generation [149.54517767056382]
We propose a novel Generative Adversarial Network (XingGAN) for person image generation tasks.
XingGAN consists of two generation branches that model the person's appearance and shape information.
We show that the proposed XingGAN advances the state-of-the-art performance in terms of objective quantitative scores and subjective visual realness.
arXiv Detail & Related papers (2020-07-17T23:40:22Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z) - Fine-grained Image-to-Image Transformation towards Visual Recognition [102.51124181873101]
We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image.
We adopt a model based on generative adversarial networks to disentangle the identity related and unrelated factors of an image.
Experiments on the CompCars and Multi-PIE datasets demonstrate that our model preserves the identity of the generated images much better than the state-of-the-art image-to-image transformation models.
arXiv Detail & Related papers (2020-01-12T05:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.