Independent Encoder for Deep Hierarchical Unsupervised Image-to-Image
Translation
- URL: http://arxiv.org/abs/2107.02494v1
- Date: Tue, 6 Jul 2021 09:18:59 GMT
- Title: Independent Encoder for Deep Hierarchical Unsupervised Image-to-Image
Translation
- Authors: Kai Ye, Yinru Ye, Minqiang Yang, Bin Hu
- Abstract summary: Main challenges of image-to-image (I2I) translation are to make the translated image realistic and retain as much information from the source domain as possible.
We propose a novel architecture, termed as IEGAN, which removes the encoder of each network and introduces an encoder that is independent of other networks.
- Score: 2.4826445086983475
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The main challenges of image-to-image (I2I) translation are to make the
translated image realistic and retain as much information from the source
domain as possible. To address this issue, we propose a novel architecture,
termed as IEGAN, which removes the encoder of each network and introduces an
encoder that is independent of other networks. Compared with previous models,
it embodies three advantages of our model: Firstly, it is more directly and
comprehensively to grasp image information since the encoder no longer receives
loss from generator and discriminator. Secondly, the independent encoder allows
each network to focus more on its own goal which makes the translated image
more realistic. Thirdly, the reduction in the number of encoders performs more
unified image representation. However, when the independent encoder applies two
down-sampling blocks, it's hard to extract semantic information. To tackle this
problem, we propose deep and shallow information space containing
characteristic and semantic information, which can guide the model to translate
high-quality images under the task with significant shape or texture change. We
compare IEGAN with other previous models, and conduct researches on semantic
information consistency and component ablation at the same time. These
experiments show the superiority and effectiveness of our architecture. Our
code is published on: https://github.com/Elvinky/IEGAN.
Related papers
- Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images.
Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images.
ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z) - Compressed Image Captioning using CNN-based Encoder-Decoder Framework [0.0]
We develop an automatic image captioning architecture that combines the strengths of convolutional neural networks (CNNs) and encoder-decoder models.
We also do a performance comparison where we delved into the realm of pre-trained CNN models.
In our quest for optimization, we also explored the integration of frequency regularization techniques to compress the "AlexNet" and "EfficientNetB0" models.
arXiv Detail & Related papers (2024-04-28T03:47:48Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text
Retrieval [117.15862403330121]
We propose LoopITR, which combines dual encoders and cross encoders in the same network for joint learning.
Specifically, we let the dual encoder provide hard negatives to the cross encoder, and use the more discriminative cross encoder to distill its predictions back to the dual encoder.
arXiv Detail & Related papers (2022-03-10T16:41:12Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by
Transferring from GANs [43.33066765114446]
Image-to-image translation suffers from inferior performance when translations between classes require large shape changes.
We propose a novel deep hierarchical Image-to-Image Translation method, called DeepI2I.
We demonstrate that transfer learning significantly improves the performance of I2I systems, especially for small datasets.
arXiv Detail & Related papers (2020-11-11T16:03:03Z) - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators.
We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output.
Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z) - Generate High Resolution Images With Generative Variational Autoencoder [0.0]
We present a novel neural network to generate high resolution images.
We replace the decoder of VAE with a discriminator while using the encoder as it is.
We evaluate our network on 3 different datasets: MNIST, LSUN and CelebA dataset.
arXiv Detail & Related papers (2020-08-12T20:15:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.