Controlling generative models with continuous factors of variations
- URL: http://arxiv.org/abs/2001.10238v1
- Date: Tue, 28 Jan 2020 10:04:04 GMT
- Title: Controlling generative models with continuous factors of variations
- Authors: Antoine Plumerault, Herv\'e Le Borgne, C\'eline Hudelot
- Abstract summary: We introduce a new method to find meaningful directions in the latent space of any generative model.
Our method does not require human annotations and is well suited for the search of directions encoding simple transformations of the generated image.
- Score: 1.7188280334580197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent deep generative models are able to provide photo-realistic images as
well as visual or textual content embeddings useful to address various tasks of
computer vision and natural language processing. Their usefulness is
nevertheless often limited by the lack of control over the generative process
or the poor understanding of the learned representation. To overcome these
major issues, very recent work has shown the interest of studying the semantics
of the latent space of generative models. In this paper, we propose to advance
on the interpretability of the latent space of generative models by introducing
a new method to find meaningful directions in the latent space of any
generative model along which we can move to control precisely specific
properties of the generated image like the position or scale of the object in
the image. Our method does not require human annotations and is particularly
well suited for the search of directions encoding simple transformations of the
generated image, such as translation, zoom or color variations. We demonstrate
the effectiveness of our method qualitatively and quantitatively, both for GANs
and variational auto-encoders.
Related papers
- Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces.
We directly leverage natural language prompts and image captions to map latent directions.
Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z) - ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes [64.57705752579207]
We evaluate the resilience of vision-based models against diverse object-to-background context variations.
We harness the generative capabilities of text-to-image, image-to-text, and image-to-segment models to automatically generate object-to-background changes.
arXiv Detail & Related papers (2024-03-07T17:48:48Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation.
Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z) - StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators [63.85888518950824]
We present a text-driven method that allows shifting a generative model to new domains.
We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains.
arXiv Detail & Related papers (2021-08-02T14:46:46Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z) - Learning a Deep Reinforcement Learning Policy Over the Latent Space of a
Pre-trained GAN for Semantic Age Manipulation [4.306143768014157]
We learn a conditional policy for semantic manipulation along specific attributes under defined identity bounds.
Results show that our learned policy samples high fidelity images with required age alterations.
arXiv Detail & Related papers (2020-11-02T13:15:18Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.