Rethinking the Paradigm of Content Constraints in Unpaired
Image-to-Image Translation
- URL: http://arxiv.org/abs/2211.10867v3
- Date: Sat, 6 Jan 2024 05:42:28 GMT
- Title: Rethinking the Paradigm of Content Constraints in Unpaired
Image-to-Image Translation
- Authors: Xiuding Cai, Yaoyao Zhu, Dong Miao, Linjie Fu, Yu Yao
- Abstract summary: We propose EnCo, a simple but efficient way to maintain the content by constraining the representational similarity in the latent space of patch-level features.
For the similarity function, we use a simple MSE loss instead of contrastive loss, which is currently widely used in I2I tasks.
In addition, we rethink the role played by discriminators in sampling patches and propose a discnative attention-guided (DAG) patch sampling strategy to replace random sampling.
- Score: 9.900050049833986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In an unpaired setting, lacking sufficient content constraints for
image-to-image translation (I2I) tasks, GAN-based approaches are usually prone
to model collapse. Current solutions can be divided into two categories,
reconstruction-based and Siamese network-based. The former requires that the
transformed or transforming image can be perfectly converted back to the
original image, which is sometimes too strict and limits the generative
performance. The latter involves feeding the original and generated images into
a feature extractor and then matching their outputs. This is not efficient
enough, and a universal feature extractor is not easily available. In this
paper, we propose EnCo, a simple but efficient way to maintain the content by
constraining the representational similarity in the latent space of patch-level
features from the same stage of the \textbf{En}coder and de\textbf{Co}der of
the generator. For the similarity function, we use a simple MSE loss instead of
contrastive loss, which is currently widely used in I2I tasks. Benefits from
the design, EnCo training is extremely efficient, while the features from the
encoder produce a more positive effect on the decoding, leading to more
satisfying generations. In addition, we rethink the role played by
discriminators in sampling patches and propose a discriminative
attention-guided (DAG) patch sampling strategy to replace random sampling. DAG
is parameter-free and only requires negligible computational overhead, while
significantly improving the performance of the model. Extensive experiments on
multiple datasets demonstrate the effectiveness and advantages of EnCo, and we
achieve multiple state-of-the-art compared to previous methods. Our code is
available at https://github.com/XiudingCai/EnCo-pytorch.
Related papers
- Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration [46.96362010335177]
In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration.
Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images.
In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM.
arXiv Detail & Related papers (2024-03-30T08:05:00Z) - {\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images.
We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context.
We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Asymmetric Learned Image Compression with Multi-Scale Residual Block,
Importance Map, and Post-Quantization Filtering [15.056672221375104]
Deep learning-based image compression has achieved better ratedistortion (R-D) performance than the latest traditional method, H.266/VVC.
Many leading learned schemes cannot maintain a good trade-off between performance and complexity.
We propose an effcient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art.
arXiv Detail & Related papers (2022-06-21T09:34:29Z) - NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image
Generation [139.8037697822064]
We present a non-parametric structured latent variable model for image generation, called NP-DRAW.
It sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas.
arXiv Detail & Related papers (2021-06-25T05:17:55Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - The Power of Triply Complementary Priors for Image Compressive Sensing [89.14144796591685]
We propose a joint low-rank deep (LRD) image model, which contains a pair of complementaryly trip priors.
We then propose a novel hybrid plug-and-play framework based on the LRD model for image CS.
To make the optimization tractable, a simple yet effective algorithm is proposed to solve the proposed H-based image CS problem.
arXiv Detail & Related papers (2020-05-16T08:17:44Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.