Spatial Latent Representations in Generative Adversarial Networks for
Image Generation
- URL: http://arxiv.org/abs/2303.14552v1
- Date: Sat, 25 Mar 2023 20:01:11 GMT
- Title: Spatial Latent Representations in Generative Adversarial Networks for
Image Generation
- Authors: Maciej Sypetkowski
- Abstract summary: We define a family of spatial latent spaces for StyleGAN2.
We show that our spaces are effective for image manipulation and encode semantic information well.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the majority of GAN architectures, the latent space is defined as a set of
vectors of given dimensionality. Such representations are not easily
interpretable and do not capture spatial information of image content directly.
In this work, we define a family of spatial latent spaces for StyleGAN2,
capable of capturing more details and representing images that are
out-of-sample in terms of the number and arrangement of object parts, such as
an image of multiple faces or a face with more than two eyes. We propose a
method for encoding images into our spaces, together with an attribute model
capable of performing attribute editing in these spaces. We show that our
spaces are effective for image manipulation and encode semantic information
well. Our approach can be used on pre-trained generator models, and attribute
edition can be done using pre-generated direction vectors making the barrier to
entry for experimentation and use extremely low. We propose a regularization
method for optimizing latent representations, which equalizes distributions of
parts of latent spaces, making representations much closer to generated ones.
We use it for encoding images into spatial spaces to obtain significant
improvement in quality while keeping semantics and ability to use our attribute
model for edition purposes. In total, using our methods gives encoding quality
boost even as high as 30% in terms of LPIPS score comparing to standard
methods, while keeping semantics. Additionally, we propose a StyleGAN2 training
procedure on our spatial latent spaces, together with a custom spatial latent
representation distribution to make spatially closer elements in the
representation more dependent on each other than farther elements. Such
approach improves the FID score by 29% on SpaceNet, and is able to generate
consistent images of arbitrary sizes on spatially homogeneous datasets, like
satellite imagery.
Related papers
- Getting it Right: Improving Spatial Consistency in Text-to-Image Models [103.52640413616436]
One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.
We create SPRIGHT, the first spatially focused, large-scale dataset, by re-captioning 6 million images from 4 widely used vision datasets.
We find that training on images containing a larger number of objects leads to substantial improvements in spatial consistency, including state-of-the-art results on T2I-CompBench with a spatial score of 0.2133, by fine-tuning on 500 images.
arXiv Detail & Related papers (2024-04-01T15:55:25Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - Zero-shot spatial layout conditioning for text-to-image diffusion models [52.24744018240424]
Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling.
We consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content.
We propose ZestGuide, a zero-shot segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models.
arXiv Detail & Related papers (2023-06-23T19:24:48Z) - Binary Latent Diffusion [36.70550531181131]
We show that a binary latent space can be explored for compact yet expressive image representations.
We present both conditional and unconditional image generation experiments with multiple datasets.
The proposed framework can be seamlessly scaled to $1024 times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.
arXiv Detail & Related papers (2023-04-10T19:03:28Z) - LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation [10.623430999818925]
We present a technique for segmenting real and AI-generated images using latent diffusion models (LDMs) trained on internet-scale datasets.
We show up to 6% improvement over standard baselines for text-to-image segmentation on natural images.
For AI-generated imagery, we show close to 20% improvement compared to state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-22T06:55:01Z) - High-fidelity GAN Inversion with Padding Space [38.9258619444968]
Inverting a Generative Adversarial Network (GAN) facilitates a wide range of image editing tasks using pre-trained generators.
Existing methods typically employ the latent space of GANs as the inversion space yet observe the insufficient recovery of spatial details.
We propose to involve the padding space of the generator to complement the latent space with spatial information.
arXiv Detail & Related papers (2022-03-21T16:32:12Z) - Low-Rank Subspaces in GANs [101.48350547067628]
This work introduces low-rank subspaces that enable more precise control of GAN generation.
LowRankGAN is able to find the low-dimensional representation of attribute manifold.
Experiments on state-of-the-art GAN models (including StyleGAN2 and BigGAN) trained on various datasets demonstrate the effectiveness of our LowRankGAN.
arXiv Detail & Related papers (2021-06-08T16:16:32Z) - Subspace Representation Learning for Few-shot Image Classification [105.7788602565317]
We propose a subspace representation learning framework to tackle few-shot image classification tasks.
It exploits a subspace in local CNN feature space to represent an image, and measures the similarity between two images according to a weighted subspace distance (WSD)
arXiv Detail & Related papers (2021-05-02T02:29:32Z) - IntroVAC: Introspective Variational Classifiers for Learning
Interpretable Latent Subspaces [6.574517227976925]
IntroVAC learns interpretable latent subspaces by exploiting information from an additional label.
We show that IntroVAC is able to learn meaningful directions in the latent space enabling fine manipulation of image attributes.
arXiv Detail & Related papers (2020-08-03T10:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.