Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer
- URL: http://arxiv.org/abs/2410.01366v1
- Date: Wed, 2 Oct 2024 09:28:21 GMT
- Title: Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer
- Authors: Kento Masui, Mayu Otani, Masahiro Nomura, Hideki Nakayama,
- Abstract summary: Style transfer task is one of those challenges that transfers the visual attributes of a style image to another content image.
We propose a training-free style transfer algorithm, Style Tracking Reverse Diffusion Process (STRDP) for a pretrained Latent Diffusion Model (LDM)
Our algorithm employs Adaptive Instance Normalization (AdaIN) function in a distinct manner during the reverse diffusion process of an LDM.
- Score: 24.46409405016844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently shown the ability to generate high-quality images. However, controlling its generation process still poses challenges. The image style transfer task is one of those challenges that transfers the visual attributes of a style image to another content image. Typical obstacle of this task is the requirement of additional training of a pre-trained model. We propose a training-free style transfer algorithm, Style Tracking Reverse Diffusion Process (STRDP) for a pretrained Latent Diffusion Model (LDM). Our algorithm employs Adaptive Instance Normalization (AdaIN) function in a distinct manner during the reverse diffusion process of an LDM while tracking the encoding history of the style image. This algorithm enables style transfer in the latent space of LDM for reduced computational cost, and provides compatibility for various LDM models. Through a series of experiments and a user study, we show that our method can quickly transfer the style of an image without additional training. The speed, compatibility, and training-free aspect of our algorithm facilitates agile experiments with combinations of styles and LDMs for extensive application.
Related papers
- Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - FISTNet: FusIon of STyle-path generative Networks for Facial Style Transfer [15.308837341075135]
StyleGAN methods have the tendency of overfitting that results in the introduction of artifacts in the facial images.
We propose a FusIon of STyles (FIST) network for facial images that leverages pre-trained multipath style transfer networks.
arXiv Detail & Related papers (2023-07-18T07:20:31Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style
Transfer [38.957512116073616]
We propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks.
Our method can generate images with the same semantic content as the source image in a zero-shot manner.
arXiv Detail & Related papers (2023-03-15T13:47:02Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - SinDDM: A Single Image Denoising Diffusion Model [28.51951207066209]
We introduce a framework for training a Denoising diffusion model on a single image.
Our method, which we coin SinDDM, learns the internal statistics of the training image by using a multi-scale diffusion process.
It is applicable in a wide array of tasks, including style transfer and harmonization.
arXiv Detail & Related papers (2022-11-29T20:44:25Z) - Controllable Style Transfer via Test-time Training of Implicit Neural
Representation [34.880651923701066]
We propose a controllable style transfer framework based on Implicit Neural Representation that pixel-wisely controls the stylized output via test-time training.
After being test-time trained once, thanks to the flexibility of the INR-based model, our framework can precisely control the stylized images in a pixel-wise manner and freely adjust image resolution without further optimization or training.
arXiv Detail & Related papers (2022-10-14T12:53:39Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.