DiffI2I: Efficient Diffusion Model for Image-to-Image Translation
- URL: http://arxiv.org/abs/2308.13767v1
- Date: Sat, 26 Aug 2023 05:18:23 GMT
- Title: DiffI2I: Efficient Diffusion Model for Image-to-Image Translation
- Authors: Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng
Tian, Wenming Yang, Radu Timotfe, Luc Van Gool
- Abstract summary: Diffusion Model (DM) has emerged as the SOTA approach for image synthesis.
DM can't perform well on some image-to-image translation (I2I) tasks.
DiffI2I comprises three key components: a compact I2I prior extraction network (CPEN), a dynamic I2I transformer (DI2Iformer) and a denoising network.
- Score: 108.82579440308267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Diffusion Model (DM) has emerged as the SOTA approach for image
synthesis. However, the existing DM cannot perform well on some image-to-image
translation (I2I) tasks. Different from image synthesis, some I2I tasks, such
as super-resolution, require generating results in accordance with GT images.
Traditional DMs for image synthesis require extensive iterations and large
denoising models to estimate entire images, which gives their strong generative
ability but also leads to artifacts and inefficiency for I2I. To tackle this
challenge, we propose a simple, efficient, and powerful DM framework for I2I,
called DiffI2I. Specifically, DiffI2I comprises three key components: a compact
I2I prior extraction network (CPEN), a dynamic I2I transformer (DI2Iformer),
and a denoising network. We train DiffI2I in two stages: pretraining and DM
training. For pretraining, GT and input images are fed into CPEN$_{S1}$ to
capture a compact I2I prior representation (IPR) guiding DI2Iformer. In the
second stage, the DM is trained to only use the input images to estimate the
same IRP as CPEN$_{S1}$. Compared to traditional DMs, the compact IPR enables
DiffI2I to obtain more accurate outcomes and employ a lighter denoising network
and fewer iterations. Through extensive experiments on various I2I tasks, we
demonstrate that DiffI2I achieves SOTA performance while significantly reducing
computational burdens.
Related papers
- MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval [20.612534837883892]
Composed Image Retrieval (CIR) is a challenging vision-language task, utilizing bi-modal (image+text) queries to retrieve target images.
In this paper, we propose a two-stage framework to tackle both discrepancies.
MoTaDual achieves the state-of-the-art performance across four widely used ZS-CIR benchmarks, while maintaining low training time and computational cost.
arXiv Detail & Related papers (2024-10-31T08:49:05Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - STEREOFOG -- Computational DeFogging via Image-to-Image Translation on a
real-world Dataset [0.8702432681310401]
Image-to-Image translation (I2I) is a subtype of Machine Learning (ML) that has tremendous potential in applications.
We introduce STEREOFOG, a dataset comprised of $10,067$ paired fogged and clear images.
We apply and optimize the pix2pix I2I ML framework to this dataset.
arXiv Detail & Related papers (2023-12-04T21:07:13Z) - CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for
Image Manipulation [57.836686457542385]
Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation.
This paper introduces Cyclenet, a novel but simple method that incorporates cycle consistency into DMs to regularize image manipulation.
arXiv Detail & Related papers (2023-10-19T21:32:21Z) - SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two
Seconds [88.06788636008051]
Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers.
These models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run.
We present a generic approach that unlocks running text-to-image diffusion models on mobile devices in less than $2$ seconds.
arXiv Detail & Related papers (2023-06-01T17:59:25Z) - E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine
Translation [40.62692548291319]
Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language.
Existing methods, both two-stage cascade and one-stage end-to-end architectures, suffer from different issues.
We propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets.
arXiv Detail & Related papers (2023-05-09T04:25:52Z) - UVCGAN v2: An Improved Cycle-Consistent GAN for Unpaired Image-to-Image
Translation [10.689788782893096]
An unpaired image-to-image (I2I) translation technique seeks to find a mapping between two domains of data in a fully unsupervised manner.
DMs hold the state-of-the-art status on the I2I translation benchmarks in terms of Frechet distance (FID)
This work improves a recent UVCGAN model and equips it with modern advancements in model architectures and training procedures.
arXiv Detail & Related papers (2023-03-28T19:46:34Z) - DiffIR: Efficient Diffusion Model for Image Restoration [108.82579440308267]
Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.
Traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient for image restoration.
We propose DiffIR, which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network.
arXiv Detail & Related papers (2023-03-16T16:47:14Z) - DDet: Dual-path Dynamic Enhancement Network for Real-World Image
Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image.
In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR.
Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.