Painterly Image Harmonization using Diffusion Model
- URL: http://arxiv.org/abs/2308.02228v1
- Date: Fri, 4 Aug 2023 09:51:57 GMT
- Title: Painterly Image Harmonization using Diffusion Model
- Authors: Lingxiao Lu, Jiangtong Li, Junyan Cao, Li Niu, and Liqing Zhang
- Abstract summary: We propose a novel Painterly Harmonization stable Diffusion model (PHDiffusion)
It includes a lightweight adaptive encoder and a Dual Fusion (DEF) module.
Specifically, the adaptive encoder and the DEF module first stylize foreground features within each encoder.
Then, the stylized foreground features from both encoders are combined to guide the harmonization process.
- Score: 17.732783922599857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Painterly image harmonization aims to insert photographic objects into
paintings and obtain artistically coherent composite images. Previous methods
for this task mainly rely on inference optimization or generative adversarial
network, but they are either very time-consuming or struggling at fine control
of the foreground objects (e.g., texture and content details). To address these
issues, we propose a novel Painterly Harmonization stable Diffusion model
(PHDiffusion), which includes a lightweight adaptive encoder and a Dual Encoder
Fusion (DEF) module. Specifically, the adaptive encoder and the DEF module
first stylize foreground features within each encoder. Then, the stylized
foreground features from both encoders are combined to guide the harmonization
process. During training, besides the noise loss in diffusion model, we
additionally employ content loss and two style losses, i.e., AdaIN style loss
and contrastive style loss, aiming to balance the trade-off between style
migration and content preservation. Compared with the state-of-the-art models
from related fields, our PHDiffusion can stylize the foreground more
sufficiently and simultaneously retain finer content. Our code and model are
available at https://github.com/bcmi/PHDiffusion-Painterly-Image-Harmonization.
Related papers
- ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - DiffHarmony: Latent Diffusion Model Meets Image Harmonization [11.500358677234939]
Diffusion models have promoted the rapid development of image-to-image translation tasks.
Fine-tuning pre-trained latent diffusion models from scratch is computationally intensive.
In this paper, we adapt a pre-trained latent diffusion model to the image harmonization task to generate harmonious but potentially blurry initial images.
arXiv Detail & Related papers (2024-04-09T09:05:23Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - DiffMorpher: Unleashing the Capability of Diffusion Models for Image
Morphing [28.593023489682654]
We present DiffMorpher, the first approach enabling smooth and natural image morphing using diffusion models.
Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition.
In addition, we propose an attention and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images.
arXiv Detail & Related papers (2023-12-12T16:28:08Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - Harnessing the Spatial-Temporal Attention of Diffusion Models for
High-Fidelity Text-to-Image Synthesis [59.10787643285506]
Diffusion-based models have achieved state-of-the-art performance on text-to-image synthesis tasks.
One critical limitation of these models is the low fidelity of generated images with respect to the text description.
We propose a new text-to-image algorithm that adds explicit control over spatial-temporal cross-attention in diffusion models.
arXiv Detail & Related papers (2023-04-07T23:49:34Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - MagicMix: Semantic Mixing with Diffusion Models [85.43291162563652]
We explore a new task called semantic mixing, aiming at blending two different semantics to create a new concept.
We present MagicMix, a solution based on pre-trained text-conditioned diffusion models.
Our method does not require any spatial mask or re-training, yet is able to synthesize novel objects with high fidelity.
arXiv Detail & Related papers (2022-10-28T11:07:48Z) - High-Fidelity Image Inpainting with GAN Inversion [23.49170140410603]
In this paper, we propose a novel GAN inversion model for image inpainting, dubbed InvertFill.
Within the encoder, the pre-modulation network leverages multi-scale structures to encode more discriminative semantics into style vectors.
To reconstruct faithful and photorealistic images, a simple yet effective Soft-update Mean Latent module is designed to capture more diverse in-domain patterns that synthesize high-fidelity textures for large corruptions.
arXiv Detail & Related papers (2022-08-25T03:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.