SHUNIT: Style Harmonization for Unpaired Image-to-Image Translation
- URL: http://arxiv.org/abs/2301.04685v1
- Date: Wed, 11 Jan 2023 19:24:03 GMT
- Title: SHUNIT: Style Harmonization for Unpaired Image-to-Image Translation
- Authors: Seokbeom Song, Suhyeon Lee, Hongje Seong, Kyoungwon Min, Euntai Kim
- Abstract summary: We present Style Harmonization for unpaired I2I translation (SHUNIT)
Our SHUNIT generates a new style by harmonizing the target domain style retrieved from a class memory and an original source image style.
We validate our method with extensive experiments and achieve state-of-the-art performance on the latest benchmark sets.
- Score: 14.485088590863327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel solution for unpaired image-to-image (I2I) translation. To
translate complex images with a wide range of objects to a different domain,
recent approaches often use the object annotations to perform per-class
source-to-target style mapping. However, there remains a point for us to
exploit in the I2I. An object in each class consists of multiple components,
and all the sub-object components have different characteristics. For example,
a car in CAR class consists of a car body, tires, windows and head and tail
lamps, etc., and they should be handled separately for realistic I2I
translation. The simplest solution to the problem will be to use more detailed
annotations with sub-object component annotations than the simple object
annotations, but it is not possible. The key idea of this paper is to bypass
the sub-object component annotations by leveraging the original style of the
input image because the original style will include the information about the
characteristics of the sub-object components. Specifically, for each pixel, we
use not only the per-class style gap between the source and target domains but
also the pixel's original style to determine the target style of a pixel. To
this end, we present Style Harmonization for unpaired I2I translation (SHUNIT).
Our SHUNIT generates a new style by harmonizing the target domain style
retrieved from a class memory and an original source image style. Instead of
direct source-to-target style mapping, we aim for source and target styles
harmonization. We validate our method with extensive experiments and achieve
state-of-the-art performance on the latest benchmark sets. The source code is
available online: https://github.com/bluejangbaljang/SHUNIT.
Related papers
- DEADiff: An Efficient Stylization Diffusion Model with Disentangled
Representations [64.43387739794531]
Current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles.
We introduce DEADiff to address this issue using the following two strategies.
DEAiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image.
arXiv Detail & Related papers (2024-03-11T17:35:23Z) - Soulstyler: Using Large Language Model to Guide Image Style Transfer for
Target Object [9.759321877363258]
"Soulstyler" allows users to guide the stylization of specific objects in an image through simple textual descriptions.
We introduce a large language model to parse the text and identify stylization goals and specific styles.
We also introduce a novel localized text-image block matching loss that ensures that style transfer is performed only on specified target objects.
arXiv Detail & Related papers (2023-11-22T18:15:43Z) - Sem-CS: Semantic CLIPStyler for Text-Based Image Style Transfer [4.588028371034406]
We propose Semantic CLIPStyler (Sem-CS) that performs semantic style transfer.
Sem-CS first segments the content image into salient and non-salient objects and then transfers artistic style based on a given style text description.
Our empirical results, including DISTS, NIMA and user study scores, show that our proposed framework yields superior qualitative and quantitative performance.
arXiv Detail & Related papers (2023-07-12T05:59:42Z) - DSI2I: Dense Style for Unpaired Image-to-Image Translation [70.93865212275412]
Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar.
We propose to represent style as a dense feature map, allowing for a finer-grained transfer to the source image without requiring any external semantic information.
Our results show that the translations produced by our approach are more diverse, preserve the source content better, and are closer to the exemplars when compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-12-26T18:45:25Z) - Learning Object-Language Alignments for Open-Vocabulary Object Detection [83.09560814244524]
We propose a novel open-vocabulary object detection framework directly learning from image-text pair data.
It enables us to train an open-vocabulary object detector on image-text pairs in a much simple and effective way.
arXiv Detail & Related papers (2022-11-27T14:47:31Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - Image-to-Image Translation with Low Resolution Conditioning [0.28675177318965034]
This work aims at transferring fine details from a high resolution (HR) source image to fit a coarse, low resolution (LR) image representation of the target.
This differs from previous methods that focus on translating a given image style into a target content.
Our approach relies on training the generative model to produce HR target images that both 1) share distinctive information of the associated source image; 2) correctly match the LR target image when downscaled.
arXiv Detail & Related papers (2021-07-23T14:22:12Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.