Fine-grained Appearance Transfer with Diffusion Models
- URL: http://arxiv.org/abs/2311.16513v1
- Date: Mon, 27 Nov 2023 04:00:04 GMT
- Title: Fine-grained Appearance Transfer with Diffusion Models
- Authors: Yuteng Ye, Guanwen Li, Hang Zhou, Cai Jiale, Junqing Yu, Yawei Luo,
Zikai Song, Qilong Xing, Youjia Zhang, Wei Yang
- Abstract summary: Image-to-image translation (I2I) seeks to alter the visual appearance between images while maintaining structural coherence.
This paper proposes an innovative framework designed to surmount these challenges by integrating various aspects of semantic matching, appearance transfer, and latent deviation.
- Score: 23.29713777525402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-to-image translation (I2I), and particularly its subfield of appearance
transfer, which seeks to alter the visual appearance between images while
maintaining structural coherence, presents formidable challenges. Despite
significant advancements brought by diffusion models, achieving fine-grained
transfer remains complex, particularly in terms of retaining detailed
structural elements and ensuring information fidelity. This paper proposes an
innovative framework designed to surmount these challenges by integrating
various aspects of semantic matching, appearance transfer, and latent
deviation. A pivotal aspect of our approach is the strategic use of the
predicted $x_0$ space by diffusion models within the latent space of diffusion
processes. This is identified as a crucial element for the precise and natural
transfer of fine-grained details. Our framework exploits this space to
accomplish semantic alignment between source and target images, facilitating
mask-wise appearance transfer for improved feature acquisition. A significant
advancement of our method is the seamless integration of these features into
the latent space, enabling more nuanced latent deviations without necessitating
extensive model retraining or fine-tuning. The effectiveness of our approach is
demonstrated through extensive experiments, which showcase its ability to
adeptly handle fine-grained appearance transfers across a wide range of
categories and domains. We provide our code at
https://github.com/babahui/Fine-grained-Appearance-Transfer
Related papers
- Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas [33.334956022229846]
We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting.
Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space.
Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
arXiv Detail & Related papers (2024-08-28T09:22:32Z) - Training-free Composite Scene Generation for Layout-to-Image Synthesis [29.186425845897947]
This paper introduces a novel training-free approach designed to overcome adversarial semantic intersections during the diffusion conditioning phase.
We propose two innovative constraints: 1) an inter-token constraint that resolves token conflicts to ensure accurate concept synthesis; and 2) a self-attention constraint that improves pixel-to-pixel relationships.
Our evaluations confirm the effectiveness of leveraging layout information for guiding the diffusion process, generating content-rich images with enhanced fidelity and complexity.
arXiv Detail & Related papers (2024-07-18T15:48:07Z) - InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture [0.0]
InsertDiffusion is a training-free diffusion architecture that efficiently embeds objects into images.
Our approach utilizes off-the-shelf generative models and eliminates the need for fine-tuning.
By decomposing the generation task into independent steps, InsertDiffusion offers a scalable solution.
arXiv Detail & Related papers (2024-07-15T10:15:58Z) - DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation [84.0586749616249]
This paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity Facial Appearance Editing.
For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability.
In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC)
This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background.
arXiv Detail & Related papers (2024-03-26T12:53:10Z) - Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data.
We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias.
This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - Diamond in the rough: Improving image realism by traversing the GAN
latent space [0.0]
We present an unsupervised method to find a direction in the latent space that aligns with improved photo-realism.
Our approach leaves the network unchanged while enhancing the fidelity of the generated image.
We use a simple generator inversion to find the direction in the latent space that results in the smallest change in the image space.
arXiv Detail & Related papers (2021-04-12T14:45:29Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.