StyleFlow For Content-Fixed Image to Image Translation
- URL: http://arxiv.org/abs/2207.01909v1
- Date: Tue, 5 Jul 2022 09:40:03 GMT
- Title: StyleFlow For Content-Fixed Image to Image Translation
- Authors: Weichen Fan, Jinghuan Chen, Jiabin Ma, Jun Hou, Shuai Yi
- Abstract summary: StyleFlow is a new I2I translation model that consists of normalizing flows and a novel Style-Aware Normalization (SAN) module.
Our model supports both image-guided translation and multi-modal synthesis.
- Score: 15.441136520005578
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Image-to-image (I2I) translation is a challenging topic in computer vision.
We divide this problem into three tasks: strongly constrained translation,
normally constrained translation, and weakly constrained translation. The
constraint here indicates the extent to which the content or semantic
information in the original image is preserved. Although previous approaches
have achieved good performance in weakly constrained tasks, they failed to
fully preserve the content in both strongly and normally constrained tasks,
including photo-realism synthesis, style transfer, and colorization, etc. To
achieve content-preserving transfer in strongly constrained and normally
constrained tasks, we propose StyleFlow, a new I2I translation model that
consists of normalizing flows and a novel Style-Aware Normalization (SAN)
module. With the invertible network structure, StyleFlow first projects input
images into deep feature space in the forward pass, while the backward pass
utilizes the SAN module to perform content-fixed feature transformation and
then projects back to image space. Our model supports both image-guided
translation and multi-modal synthesis. We evaluate our model in several I2I
translation benchmarks, and the results show that the proposed model has
advantages over previous methods in both strongly constrained and normally
constrained tasks.
Related papers
- Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language.
In this paper, we propose an end-to-end IIMT model consisting of four modules.
Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z) - AnyTrans: Translate AnyText in the Image with Large Scale Models [88.5887934499388]
This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI)
Our framework incorporates contextual cues from both textual and visual elements during translation.
We have meticulously compiled a test dataset called MTIT6, which consists of multilingual text image translation data from six language pairs.
arXiv Detail & Related papers (2024-06-17T11:37:48Z) - Hierarchy Flow For High-Fidelity Image-to-Image Translation [38.87847690777645]
We propose a novel flow-based model to achieve better content preservation during translation.
Our approach achieves state-of-the-art performance, with convincing advantages in both strong- and normal-fidelity tasks.
arXiv Detail & Related papers (2023-08-14T03:11:17Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - Harnessing the Conditioning Sensorium for Improved Image Translation [2.9631016562930546]
Multi-modal domain translation typically refers to a novel image that inherits certain localized attributes from a 'content' image.
We propose a new approach to learn disentangled 'content' and'style' representations from scratch.
We define 'content' based on conditioning information extracted by off-the-shelf pre-trained models.
We then train our style extractor and image decoder with an easy to optimize set of reconstruction objectives.
arXiv Detail & Related papers (2021-10-13T02:07:43Z) - Unbalanced Feature Transport for Exemplar-based Image Translation [51.54421432912801]
This paper presents a general image translation framework that incorporates optimal transport for feature alignment between conditional inputs and style exemplars in image translation.
We show that our method achieves superior image translation qualitatively and quantitatively as compared with the state-of-the-art.
arXiv Detail & Related papers (2021-06-19T12:07:48Z) - Smoothing the Disentangled Latent Style Space for Unsupervised
Image-to-Image Translation [56.55178339375146]
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic results.
We propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space.
arXiv Detail & Related papers (2021-06-16T17:58:21Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.