Related papers: Pretraining is All You Need for Image-to-Image Translation

Pretraining is All You Need for Image-to-Image Translation

URL: http://arxiv.org/abs/2205.12952v1
Date: Wed, 25 May 2022 17:58:26 GMT
Title: Pretraining is All You Need for Image-to-Image Translation
Authors: Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, Fang Wen
Abstract summary: We propose to use pretraining to boost general image-to-image translation. We show that the proposed pretraining-based image-to-image translation (PITI) is capable of synthesizing images of unprecedented realism and faithfulness.
Score: 59.43151345732397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose to use pretraining to boost general image-to-image translation. Prior image-to-image translation methods usually need dedicated architectural design and train individual translation models from scratch, struggling for high-quality generation of complex scenes, especially when paired training data are not abundant. In this paper, we regard each image-to-image translation problem as a downstream task and introduce a simple and generic framework that adapts a pretrained diffusion model to accommodate various kinds of image-to-image translation. We also propose adversarial training to enhance the texture synthesis in the diffusion model training, in conjunction with normalized guidance sampling to improve the generation quality. We present extensive empirical comparison across various tasks on challenging benchmarks such as ADE20K, COCO-Stuff, and DIODE, showing the proposed pretraining-based image-to-image translation (PITI) is capable of synthesizing images of unprecedented realism and faithfulness.

Related papers

Ensuring Consistency for In-Image Translation [47.1986912570945]
The in-image machine translation task involves translating text embedded within images, with the translated results presented in image format. We propose the need to uphold two types of consistency in this task: translation consistency and image generation consistency. We introduce a novel two-stage framework named HCIIT which involves text-image translation using a multimodal multilingual large language model in the first stage and image backfilling with a diffusion model in the second stage.
arXiv Detail & Related papers (2024-12-24T03:50:03Z)
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models. Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z)
Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation [12.365230063278625]
We propose a new approach for flexible image translation by learning a layout-aware image condition together with a text condition. Our method co-encodes images and text into a new domain during the training phase. Experimental comparisons of our method with state-of-the-art methods demonstrate our model performs best in both style image translation and semantic image translation.
arXiv Detail & Related papers (2023-02-05T02:47:13Z)
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [53.170767750244366]
Imagen is a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models.
arXiv Detail & Related papers (2022-05-23T17:42:53Z)
Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data. We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z)
Deep Translation Prior: Test-time Training for Photorealistic Style Transfer [36.82737412912885]
Recent techniques to solve photorealistic style transfer within deep convolutional neural networks (CNNs) generally require intensive training from large-scale datasets. We propose a novel framework, dubbed Deep Translation Prior (DTP), to accomplish photorealistic style transfer through test-time training on given input image pair with untrained networks.
arXiv Detail & Related papers (2021-12-12T04:54:27Z)
LAFITE: Towards Language-Free Training for Text-to-Image Generation [83.2935513540494]
We propose the first work to train text-to-image generation models without any text data. Our method leverages the well-aligned multi-modal semantic space of the powerful pre-trained CLIP model. We obtain state-of-the-art results in the standard text-to-image generation tasks.
arXiv Detail & Related papers (2021-11-27T01:54:45Z)
StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation. We learn a latent embedding, jointly with the generator, that models the variability of the output domain. Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.