Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
- URL: http://arxiv.org/abs/2304.11829v2
- Date: Tue, 25 Apr 2023 17:11:34 GMT
- Title: Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
- Authors: Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao,
Xihui Liu
- Abstract summary: We exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models.
The hierarchical latent space of HDAE inherently encodes different abstract levels of semantics and provides more comprehensive semantic representations.
We demonstrate the effectiveness of our proposed approach with extensive experiments and applications on image reconstruction, style mixing, controllable, detail-preserving and disentangled image manipulation.
- Score: 36.20575570779196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have attained impressive visual quality for image synthesis.
However, how to interpret and manipulate the latent space of diffusion models
has not been extensively explored. Prior work diffusion autoencoders encode the
semantic representations into a semantic latent code, which fails to reflect
the rich information of details and the intrinsic feature hierarchy. To
mitigate those limitations, we propose Hierarchical Diffusion Autoencoders
(HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level
feature hierarchy for the latent space of diffusion models. The hierarchical
latent space of HDAE inherently encodes different abstract levels of semantics
and provides more comprehensive semantic representations. In addition, we
propose a truncated-feature-based approach for disentangled image manipulation.
We demonstrate the effectiveness of our proposed approach with extensive
experiments and applications on image reconstruction, style mixing,
controllable interpolation, detail-preserving and disentangled image
manipulation, and multi-modal semantic image synthesis.
Related papers
- Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach [44.03561901593423]
This paper introduces a content-adaptive diffusion model for scalable image compression.
The proposed method encodes fine textures through a diffusion process, enhancing perceptual quality.
Experiments demonstrate the effectiveness of the proposed framework in both image reconstruction and downstream machine vision tasks.
arXiv Detail & Related papers (2024-10-08T15:48:34Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Semantic-Conditional Diffusion Networks for Image Captioning [116.86677915812508]
We propose a new diffusion model based paradigm tailored for image captioning, namely Semantic-Conditional Diffusion Networks (SCD-Net)
In SCD-Net, multiple Diffusion Transformer structures are stacked to progressively strengthen the output sentence with better visional-language alignment and linguistical coherence.
Experiments on COCO dataset demonstrate the promising potential of using diffusion models in the challenging image captioning task.
arXiv Detail & Related papers (2022-12-06T16:08:16Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - High-Resolution Image Synthesis with Latent Diffusion Models [14.786952412297808]
Training diffusion models on autoencoders allows for the first time to reach a near-optimal point between complexity reduction and detail preservation.
Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks.
arXiv Detail & Related papers (2021-12-20T18:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.