DiffuseGAE: Controllable and High-fidelity Image Manipulation from
Disentangled Representation
- URL: http://arxiv.org/abs/2307.05899v1
- Date: Wed, 12 Jul 2023 04:11:08 GMT
- Title: DiffuseGAE: Controllable and High-fidelity Image Manipulation from
Disentangled Representation
- Authors: Yipeng Leng, Qiangjuan Huang, Zhiyuan Wang, Yangyang Liu, Haoyu Zhang
- Abstract summary: Diffusion probabilistic models (DPMs) have shown remarkable results on various image synthesis tasks.
DPMs lack a low-dimensional, interpretable, and well-decoupled latent code.
We propose Diff-AE to explore the potential of DPMs for representation learning via autoencoding.
- Score: 14.725538019917625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion probabilistic models (DPMs) have shown remarkable results on
various image synthesis tasks such as text-to-image generation and image
inpainting. However, compared to other generative methods like VAEs and GANs,
DPMs lack a low-dimensional, interpretable, and well-decoupled latent code.
Recently, diffusion autoencoders (Diff-AE) were proposed to explore the
potential of DPMs for representation learning via autoencoding. Diff-AE
provides an accessible latent space that exhibits remarkable interpretability,
allowing us to manipulate image attributes based on latent codes from the
space. However, previous works are not generic as they only operated on a few
limited attributes. To further explore the latent space of Diff-AE and achieve
a generic editing pipeline, we proposed a module called Group-supervised
AutoEncoder(dubbed GAE) for Diff-AE to achieve better disentanglement on the
latent code. Our proposed GAE has trained via an attribute-swap strategy to
acquire the latent codes for multi-attribute image manipulation based on
examples. We empirically demonstrate that our method enables
multiple-attributes manipulation and achieves convincing sample quality and
attribute alignments, while significantly reducing computational requirements
compared to pixel-based approaches for representational decoupling. Code will
be released soon.
Related papers
- In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation [36.20575570779196]
We exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models.
The hierarchical latent space of HDAE inherently encodes different abstract levels of semantics and provides more comprehensive semantic representations.
We demonstrate the effectiveness of our proposed approach with extensive experiments and applications on image reconstruction, style mixing, controllable, detail-preserving and disentangled image manipulation.
arXiv Detail & Related papers (2023-04-24T05:35:59Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Unsupervised Representation Learning from Pre-trained Diffusion
Probabilistic Models [83.75414370493289]
Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples.
Diff-AE have been proposed to explore DPMs for representation learning via autoencoding.
We propose textbfPre-trained textbfAutotextbfEncoding (textbfPDAE) to adapt existing pre-trained DPMs to the decoders for image reconstruction.
arXiv Detail & Related papers (2022-12-26T02:37:38Z) - Everything is There in Latent Space: Attribute Editing and Attribute
Style Manipulation by StyleGAN Latent Space Exploration [39.18239951479647]
We present Few-shot Latent-based Attribute Manipulation and Editing (FLAME)
FLAME is a framework to perform highly controlled image editing by latent space manipulation.
We generate diverse attribute styles in disentangled manner.
arXiv Detail & Related papers (2022-07-20T12:40:32Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Diffusion Autoencoders: Toward a Meaningful and Decodable Representation [1.471992435706872]
Diffusion models (DPMs) have achieved remarkable quality in image generation that rivals GANs'.
Unlike GANs, DPMs use a set of latent variables that lack semantic meaning and cannot serve as a useful representation for other tasks.
This paper explores the possibility of using DPMs for representation learning and seeks to extract a meaningful and decodable representation of an input image via autoencoding.
arXiv Detail & Related papers (2021-11-30T18:24:04Z) - Semi-Supervised Domain Adaptation with Prototypical Alignment and
Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled.
To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks.
Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.