Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance
- URL: http://arxiv.org/abs/2306.04396v1
- Date: Wed, 7 Jun 2023 12:56:56 GMT
- Title: Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance
- Authors: Gihyun Kwon, Jong Chul Ye
- Abstract summary: We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
- Score: 51.188396199083336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have shown significant progress in image translation tasks
recently. However, due to their stochastic nature, there's often a trade-off
between style transformation and content preservation. Current strategies aim
to disentangle style and content, preserving the source image's structure while
successfully transitioning from a source to a target domain under text or
one-shot image conditions. Yet, these methods often require computationally
intense fine-tuning of diffusion models or additional neural networks. To
address these challenges, here we present an approach that guides the reverse
process of diffusion sampling by applying asymmetric gradient guidance. This
results in quicker and more stable image manipulation for both text-guided and
image-guided image translation. Our model's adaptability allows it to be
implemented with both image- and latent-diffusion models. Experiments show that
our method outperforms various state-of-the-art models in image translation
tasks.
Related papers
- Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style
Transfer [38.957512116073616]
We propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks.
Our method can generate images with the same semantic content as the source image in a zero-shot manner.
arXiv Detail & Related papers (2023-03-15T13:47:02Z) - Cap2Aug: Caption guided Image to Image data Augmentation [41.53127698828463]
Cap2Aug is an image-to-image diffusion model-based data augmentation strategy using image captions as text prompts.
We generate captions from the limited training images and using these captions edit the training images using an image-to-image stable diffusion model.
This strategy generates augmented versions of images similar to the training images yet provides semantic diversity across the samples.
arXiv Detail & Related papers (2022-12-11T04:37:43Z) - Semantic-Conditional Diffusion Networks for Image Captioning [116.86677915812508]
We propose a new diffusion model based paradigm tailored for image captioning, namely Semantic-Conditional Diffusion Networks (SCD-Net)
In SCD-Net, multiple Diffusion Transformer structures are stacked to progressively strengthen the output sentence with better visional-language alignment and linguistical coherence.
Experiments on COCO dataset demonstrate the promising potential of using diffusion models in the challenging image captioning task.
arXiv Detail & Related papers (2022-12-06T16:08:16Z) - Diffusion-based Image Translation using Disentangled Style and Content
Representation [51.188396199083336]
Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer.
It is often difficult to maintain the original content of the image during the reverse diffusion.
We present a novel diffusion-based unsupervised image translation method using disentangled style and content representation.
Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.
arXiv Detail & Related papers (2022-09-30T06:44:37Z) - MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image
Translation [29.03892463588357]
We present a novel method for exemplar-based image translation, called matching interleaved diffusion models (MIDMs)
We formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space.
To improve the reliability of the diffusion process, we design a confidence-aware process using cycle-consistency to consider only confident regions.
arXiv Detail & Related papers (2022-09-22T14:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.