StyleAutoEncoder for manipulating image attributes using pre-trained StyleGAN
- URL: http://arxiv.org/abs/2412.20164v1
- Date: Sat, 28 Dec 2024 14:30:48 GMT
- Title: StyleAutoEncoder for manipulating image attributes using pre-trained StyleGAN
- Authors: Andrzej Bedychaj, Jacek Tabor, Marek Ĺmieja,
- Abstract summary: StyleAutoEncoder is a plugin for pre-trained generative models.
It allows for manipulating the requested attributes of images.
It is at least as effective in manipulating image attributes as the state-of-the-art algorithms.
- Score: 8.71029643563855
- License:
- Abstract: Deep conditional generative models are excellent tools for creating high-quality images and editing their attributes. However, training modern generative models from scratch is very expensive and requires large computational resources. In this paper, we introduce StyleAutoEncoder (StyleAE), a lightweight AutoEncoder module, which works as a plugin for pre-trained generative models and allows for manipulating the requested attributes of images. The proposed method offers a cost-effective solution for training deep generative models with limited computational resources, making it a promising technique for a wide range of applications. We evaluate StyleAutoEncoder by combining it with StyleGAN, which is currently one of the top generative models. Our experiments demonstrate that StyleAutoEncoder is at least as effective in manipulating image attributes as the state-of-the-art algorithms based on invertible normalizing flows. However, it is simpler, faster, and gives more freedom in designing neural
Related papers
- JetFormer: An Autoregressive Generative Model of Raw Images and Text [62.2573739835562]
We propose an autoregressive decoder-only transformer - JetFormer - which is trained to directly maximize the likelihood of raw data.
We leverage a normalizing flow model to obtain a soft-token image representation that is jointly trained with an autoregressive multimodal transformer.
JetFormer achieves text-to-image generation quality competitive with recent VQ-VAE- and VAE-based baselines.
arXiv Detail & Related papers (2024-11-29T14:14:59Z) - Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder [57.574544285878794]
Ada-Adapter is a novel framework for few-shot style personalization of diffusion models.
Our method enables efficient zero-shot style transfer utilizing a single reference image.
We demonstrate the effectiveness of our approach on various artistic styles, including flat art, 3D rendering, and logo design.
arXiv Detail & Related papers (2024-07-08T02:00:17Z) - Class-Conditional self-reward mechanism for improved Text-to-Image models [1.8434042562191815]
We build upon the concept of self-rewarding models and introduce its vision equivalent for Text-to-Image generative AI models.
This approach works by fine-tuning diffusion model on a self-generated self-judged dataset.
It has been evaluated to be at least 60% better than existing commercial and research Text-to-image models.
arXiv Detail & Related papers (2024-05-22T09:28:43Z) - Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale.
We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme.
We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z) - Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack [75.00066365801993]
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.
These pre-trained models often face challenges when it comes to generating highly aesthetic images.
We propose quality-tuning to guide a pre-trained model to exclusively generate highly visually appealing images.
arXiv Detail & Related papers (2023-09-27T17:30:19Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Implementing and Experimenting with Diffusion Models for Text-to-Image
Generation [0.0]
Two models, DALL-E 2 and Imagen, have demonstrated that highly photorealistic images could be generated from a simple textual description of an image.
Text-to-image models require exceptionally large amounts of computational resources to train, as well as handling huge datasets collected from the internet.
This thesis contributes by reviewing the different approaches and techniques used by these models, and then by proposing our own implementation of a text-to-image model.
arXiv Detail & Related papers (2022-09-22T12:03:33Z) - Feature-Style Encoder for Style-Based GAN Inversion [1.9116784879310027]
We propose a novel architecture for GAN inversion, which we call Feature-Style encoder.
Our model achieves accurate inversion of real images from the latent space of a pre-trained style-based GAN model.
Thanks to its encoder structure, the model allows fast and accurate image editing.
arXiv Detail & Related papers (2022-02-04T15:19:34Z) - AE-StyleGAN: Improved Training of Style-Based Auto-Encoders [21.51697087024866]
StyleGANs have shown impressive results on data generation and manipulation in recent years.
In this paper, we focus on style-based generators asking a scientific question: Does forcing such a generator to reconstruct real data lead to more disentangled latent space and make the inversion process from image to latent space easy?
We describe a new methodology to train a style-based autoencoder where the encoder and generator are optimized end-to-end.
arXiv Detail & Related papers (2021-10-17T04:25:51Z) - Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation.
The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image.
Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.