Conditional Image Generation and Manipulation for User-Specified Content
- URL: http://arxiv.org/abs/2005.04909v1
- Date: Mon, 11 May 2020 08:05:00 GMT
- Title: Conditional Image Generation and Manipulation for User-Specified Content
- Authors: David Stap, Maurits Bleeker, Sarah Ibrahimi, Maartje ter Hoeve
- Abstract summary: We propose a single pipeline for text-to-image generation and manipulation.
In the first part of our pipeline we introduce textStyleGAN, a model that is conditioned on text.
In the second part of our pipeline we make use of the pre-trained weights of textStyleGAN to perform semantic facial image manipulation.
- Score: 6.6081578501076494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Generative Adversarial Networks (GANs) have improved
steadily towards generating increasingly impressive real-world images. It is
useful to steer the image generation process for purposes such as content
creation. This can be done by conditioning the model on additional information.
However, when conditioning on additional information, there still exists a
large set of images that agree with a particular conditioning. This makes it
unlikely that the generated image is exactly as envisioned by a user, which is
problematic for practical content creation scenarios such as generating facial
composites or stock photos. To solve this problem, we propose a single pipeline
for text-to-image generation and manipulation. In the first part of our
pipeline we introduce textStyleGAN, a model that is conditioned on text. In the
second part of our pipeline we make use of the pre-trained weights of
textStyleGAN to perform semantic facial image manipulation. The approach works
by finding semantic directions in latent space. We show that this method can be
used to manipulate facial images for a wide range of attributes. Finally, we
introduce the CelebTD-HQ dataset, an extension to CelebA-HQ, consisting of
faces and corresponding textual descriptions.
Related papers
- Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance [17.251982243534144]
LAR-Gen is a novel approach for image inpainting that enables seamless inpainting of masked scene images.
Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence.
Experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency.
arXiv Detail & Related papers (2024-03-28T16:07:55Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation.
Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z) - Zero-shot spatial layout conditioning for text-to-image diffusion models [52.24744018240424]
Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling.
We consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content.
We propose ZestGuide, a zero-shot segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models.
arXiv Detail & Related papers (2023-06-23T19:24:48Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - CIGLI: Conditional Image Generation from Language & Image [5.159265382427163]
We propose a new task called CIGLI: Conditional Image Generation from Language and Image.
Instead of generating an image based on text as in text-image generation, this task requires the generation of an image from a textual description and an image prompt.
arXiv Detail & Related papers (2021-08-20T00:58:42Z) - Semantic Text-to-Face GAN -ST^2FG [0.7919810878571298]
We present a novel approach to generate facial images from semantic text descriptions.
For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful.
arXiv Detail & Related papers (2021-07-22T15:42:25Z) - Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects.
The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image.
We show that the proposed model performs favorably against recent strong baselines on three public datasets.
arXiv Detail & Related papers (2020-08-11T07:07:10Z) - Semantic Image Manipulation Using Scene Graphs [105.03614132953285]
We introduce a-semantic scene graph network that does not require direct supervision for constellation changes or image edits.
This makes possible to train the system from existing real-world datasets with no additional annotation effort.
arXiv Detail & Related papers (2020-04-07T20:02:49Z) - StyleGAN2 Distillation for Feed-forward Image Manipulation [5.5080625617632]
StyleGAN2 is a state-of-the-art network in generating realistic images.
We propose a way to distill a particular image manipulation of StyleGAN2 into image-to-image network trained in paired way.
arXiv Detail & Related papers (2020-03-07T14:02:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.