Related papers: Conditional Score Guidance for Text-Driven Image-to-Image Translation

Conditional Score Guidance for Text-Driven Image-to-Image Translation

URL: http://arxiv.org/abs/2305.18007v3
Date: Sat, 18 Nov 2023 07:29:49 GMT
Title: Conditional Score Guidance for Text-Driven Image-to-Image Translation
Authors: Hyunsoo Lee, Minsoo Kang, Bohyung Han
Abstract summary: We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing the regions of interest in a source image.
Score: 52.73564644268749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing the regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function that additionally considers both the source image and the source text prompt, tailored to address specific translation tasks. To this end, we derive the conditional score function in a principled manner, decomposing it into the standard score and a guiding term for target image generation. For the gradient computation of the guiding term, we assume a Gaussian distribution of the posterior distribution and estimate its mean and variance to adjust the gradient without additional training. In addition, to improve the quality of the conditional score guidance, we incorporate a simple yet effective mixup technique, which combines two cross-attention maps derived from the source and target latents. This strategy is effective for promoting a desirable fusion of the invariant parts in the source image and the edited regions aligned with the target prompt, leading to high-fidelity target image generation. Through comprehensive experiments, we demonstrate that our approach achieves outstanding image-to-image translation performance on various tasks.

Related papers

Contrastive Learning Guided Latent Diffusion Model for Image-to-Image Translation [7.218556478126324]
diffusion model has demonstrated superior performance in diverse and high-quality images for text-guided image translation. We propose pix2pix-zeroCon, a zero-shot diffusion-based method that eliminates the need for additional training by leveraging patch-wise contrastive loss. Our approach requires no additional training and operates directly on a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2025-03-26T12:15:25Z)
Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance [46.922018440110826]
We present a training-free approach for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method achieves outstanding image-to-image translation performance on various tasks when combined with the pretrained Stable Diffusion model.
arXiv Detail & Related papers (2024-12-20T11:15:31Z)
ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models [55.43801602995778]
We present ImPoster, a novel algorithm for generating a target image of a'source' subject performing a 'driving' action. Our approach is completely unsupervised and does not require any access to additional annotations like keypoints or pose.
arXiv Detail & Related papers (2024-09-24T01:25:19Z)
Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation [18.895926089773177]
Cross-modality image segmentation aims to segment the target modalities using a method designed in the source modality. Deep generative models can translate the target modality images into the source modality, thus enabling cross-modality segmentation.
arXiv Detail & Related papers (2024-04-01T13:23:04Z)
Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance. Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models. Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z)
Masked and Adaptive Transformer for Exemplar Based Image Translation [16.93344592811513]
Cross-domain semantic matching is challenging. We propose a masked and adaptive transformer (MAT) for learning accurate cross-domain correspondence. We devise a novel contrastive style learning method, for acquire quality-discriminative style representations.
arXiv Detail & Related papers (2023-03-30T03:21:14Z)
Diffusion-based Image Translation using Disentangled Style and Content Representation [51.188396199083336]
Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer. It is often difficult to maintain the original content of the image during the reverse diffusion. We present a novel diffusion-based unsupervised image translation method using disentangled style and content representation. Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.
arXiv Detail & Related papers (2022-09-30T06:44:37Z)
Blended Diffusion for Text-driven Editing of Natural Images [18.664733153082146]
We introduce the first solution for performing local (region-based) edits in generic natural images. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP) To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent.
arXiv Detail & Related papers (2021-11-29T18:58:49Z)
Global and Local Alignment Networks for Unpaired Image-to-Image Translation [170.08142745705575]
The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style. Due to the lack of attention to the content change in existing methods, semantic information from source images suffers from degradation during translation. We introduce a novel approach, Global and Local Alignment Networks (GLA-Net) Our method effectively generates sharper and more realistic images than existing approaches.
arXiv Detail & Related papers (2021-11-19T18:01:54Z)
Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives. We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z)
GAIT: Gradient Adjusted Unsupervised Image-to-Image Translation [5.076419064097734]
An adversarial loss is utilized to match the distributions of the translated and target image sets. This may create artifacts if two domains have different marginal distributions, for example, in uniform areas. We propose an unsupervised IIT that preserves the uniform regions after the translation.
arXiv Detail & Related papers (2020-09-02T08:04:00Z)
Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation [43.09068177612067]
Unsupervised domain adaptation enables to alleviate the need for pixel-wise annotation in the semantic segmentation. One of the most common strategies is to translate images from the source domain to the target domain and then align their marginal distributions in the feature space using adversarial learning. Here, we present an innovative framework, designed to mitigate the image translation bias and align cross-domain features with the same category.
arXiv Detail & Related papers (2020-03-10T10:06:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.