Related papers: StyleFlow For Content-Fixed Image to Image Translation

StyleFlow For Content-Fixed Image to Image Translation

URL: http://arxiv.org/abs/2207.01909v1
Date: Tue, 5 Jul 2022 09:40:03 GMT
Title: StyleFlow For Content-Fixed Image to Image Translation
Authors: Weichen Fan, Jinghuan Chen, Jiabin Ma, Jun Hou, Shuai Yi
Abstract summary: StyleFlow is a new I2I translation model that consists of normalizing flows and a novel Style-Aware Normalization (SAN) module. Our model supports both image-guided translation and multi-modal synthesis.
Score: 15.441136520005578
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Image-to-image (I2I) translation is a challenging topic in computer vision. We divide this problem into three tasks: strongly constrained translation, normally constrained translation, and weakly constrained translation. The constraint here indicates the extent to which the content or semantic information in the original image is preserved. Although previous approaches have achieved good performance in weakly constrained tasks, they failed to fully preserve the content in both strongly and normally constrained tasks, including photo-realism synthesis, style transfer, and colorization, etc. To achieve content-preserving transfer in strongly constrained and normally constrained tasks, we propose StyleFlow, a new I2I translation model that consists of normalizing flows and a novel Style-Aware Normalization (SAN) module. With the invertible network structure, StyleFlow first projects input images into deep feature space in the forward pass, while the backward pass utilizes the SAN module to perform content-fixed feature transformation and then projects back to image space. Our model supports both image-guided translation and multi-modal synthesis. We evaluate our model in several I2I translation benchmarks, and the results show that the proposed model has advantages over previous methods in both strongly constrained and normally constrained tasks.

Related papers

Qwen-Image Technical Report [86.46471547116158]
We present Qwen-Image, an image generation foundation model that achieves significant advances in complex text rendering and precise image editing.<n>We design a comprehensive data pipeline that includes large-scale data collection, filtering, annotation, synthesis, and balancing.<n>Qwen-Image performs exceptionally well in alphabetic languages such as English, and also achieves remarkable progress on more challenging logographic languages like Chinese.
arXiv Detail & Related papers (2025-08-04T11:49:20Z)
Unpaired Image-to-Image Translation with Content Preserving Perspective: A Review [1.1243043117244755]
Image-to-image translation (I2I) transforms an image from a source domain to a target domain while preserving source content. The degree of preservation of the content of the source images in the translation process can be different according to the problem and the intended application. We divide the different tasks in the field of image-to-image translation into three categories: Fully Content preserving, Partially Content preserving, and Non-Content preserving.
arXiv Detail & Related papers (2025-02-11T20:09:29Z)
Ensuring Consistency for In-Image Translation [47.1986912570945]
The in-image machine translation task involves translating text embedded within images, with the translated results presented in image format. We propose the need to uphold two types of consistency in this task: translation consistency and image generation consistency. We introduce a novel two-stage framework named HCIIT which involves text-image translation using a multimodal multilingual large language model in the first stage and image backfilling with a diffusion model in the second stage.
arXiv Detail & Related papers (2024-12-24T03:50:03Z)
Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language. In this paper, we propose an end-to-end IIMT model consisting of four modules. Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z)
AnyTrans: Translate AnyText in the Image with Large Scale Models [88.5887934499388]
This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI) Our framework incorporates contextual cues from both textual and visual elements during translation. We have meticulously compiled a test dataset called MTIT6, which consists of multilingual text image translation data from six language pairs.
arXiv Detail & Related papers (2024-06-17T11:37:48Z)
Hierarchy Flow For High-Fidelity Image-to-Image Translation [38.87847690777645]
We propose a novel flow-based model to achieve better content preservation during translation. Our approach achieves state-of-the-art performance, with convincing advantages in both strong- and normal-fidelity tasks.
arXiv Detail & Related papers (2023-08-14T03:11:17Z)
Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data. We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z)
Harnessing the Conditioning Sensorium for Improved Image Translation [2.9631016562930546]
Multi-modal domain translation typically refers to a novel image that inherits certain localized attributes from a 'content' image. We propose a new approach to learn disentangled 'content' and'style' representations from scratch. We define 'content' based on conditioning information extracted by off-the-shelf pre-trained models. We then train our style extractor and image decoder with an easy to optimize set of reconstruction objectives.
arXiv Detail & Related papers (2021-10-13T02:07:43Z)
Unbalanced Feature Transport for Exemplar-based Image Translation [51.54421432912801]
This paper presents a general image translation framework that incorporates optimal transport for feature alignment between conditional inputs and style exemplars in image translation. We show that our method achieves superior image translation qualitatively and quantitatively as compared with the state-of-the-art.
arXiv Detail & Related papers (2021-06-19T12:07:48Z)
Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation [56.55178339375146]
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic results. We propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space.
arXiv Detail & Related papers (2021-06-16T17:58:21Z)
Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains. In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task. Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.