Conditional Score Guidance for Text-Driven Image-to-Image Translation
- URL: http://arxiv.org/abs/2305.18007v3
- Date: Sat, 18 Nov 2023 07:29:49 GMT
- Title: Conditional Score Guidance for Text-Driven Image-to-Image Translation
- Authors: Hyunsoo Lee, Minsoo Kang, Bohyung Han
- Abstract summary: We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model.
Our method aims to generate a target image by selectively editing the regions of interest in a source image.
- Score: 52.73564644268749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel algorithm for text-driven image-to-image translation based
on a pretrained text-to-image diffusion model. Our method aims to generate a
target image by selectively editing the regions of interest in a source image,
defined by a modifying text, while preserving the remaining parts. In contrast
to existing techniques that solely rely on a target prompt, we introduce a new
score function that additionally considers both the source image and the source
text prompt, tailored to address specific translation tasks. To this end, we
derive the conditional score function in a principled manner, decomposing it
into the standard score and a guiding term for target image generation. For the
gradient computation of the guiding term, we assume a Gaussian distribution of
the posterior distribution and estimate its mean and variance to adjust the
gradient without additional training. In addition, to improve the quality of
the conditional score guidance, we incorporate a simple yet effective mixup
technique, which combines two cross-attention maps derived from the source and
target latents. This strategy is effective for promoting a desirable fusion of
the invariant parts in the source image and the edited regions aligned with the
target prompt, leading to high-fidelity target image generation. Through
comprehensive experiments, we demonstrate that our approach achieves
outstanding image-to-image translation performance on various tasks.
Related papers
- ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models [55.43801602995778]
We present ImPoster, a novel algorithm for generating a target image of a'source' subject performing a 'driving' action.
Our approach is completely unsupervised and does not require any access to additional annotations like keypoints or pose.
arXiv Detail & Related papers (2024-09-24T01:25:19Z) - Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation [18.895926089773177]
Cross-modality image segmentation aims to segment the target modalities using a method designed in the source modality.
Deep generative models can translate the target modality images into the source modality, thus enabling cross-modality segmentation.
arXiv Detail & Related papers (2024-04-01T13:23:04Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - Masked and Adaptive Transformer for Exemplar Based Image Translation [16.93344592811513]
Cross-domain semantic matching is challenging.
We propose a masked and adaptive transformer (MAT) for learning accurate cross-domain correspondence.
We devise a novel contrastive style learning method, for acquire quality-discriminative style representations.
arXiv Detail & Related papers (2023-03-30T03:21:14Z) - Diffusion-based Image Translation using Disentangled Style and Content
Representation [51.188396199083336]
Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer.
It is often difficult to maintain the original content of the image during the reverse diffusion.
We present a novel diffusion-based unsupervised image translation method using disentangled style and content representation.
Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.
arXiv Detail & Related papers (2022-09-30T06:44:37Z) - Blended Diffusion for Text-driven Editing of Natural Images [18.664733153082146]
We introduce the first solution for performing local (region-based) edits in generic natural images.
We achieve our goal by leveraging and combining a pretrained language-image model (CLIP)
To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent.
arXiv Detail & Related papers (2021-11-29T18:58:49Z) - Global and Local Alignment Networks for Unpaired Image-to-Image
Translation [170.08142745705575]
The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style.
Due to the lack of attention to the content change in existing methods, semantic information from source images suffers from degradation during translation.
We introduce a novel approach, Global and Local Alignment Networks (GLA-Net)
Our method effectively generates sharper and more realistic images than existing approaches.
arXiv Detail & Related papers (2021-11-19T18:01:54Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z) - GAIT: Gradient Adjusted Unsupervised Image-to-Image Translation [5.076419064097734]
An adversarial loss is utilized to match the distributions of the translated and target image sets.
This may create artifacts if two domains have different marginal distributions, for example, in uniform areas.
We propose an unsupervised IIT that preserves the uniform regions after the translation.
arXiv Detail & Related papers (2020-09-02T08:04:00Z) - Label-Driven Reconstruction for Domain Adaptation in Semantic
Segmentation [43.09068177612067]
Unsupervised domain adaptation enables to alleviate the need for pixel-wise annotation in the semantic segmentation.
One of the most common strategies is to translate images from the source domain to the target domain and then align their marginal distributions in the feature space using adversarial learning.
Here, we present an innovative framework, designed to mitigate the image translation bias and align cross-domain features with the same category.
arXiv Detail & Related papers (2020-03-10T10:06:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.