Enhanced Unsupervised Image-to-Image Translation Using Contrastive Learning and Histogram of Oriented Gradients
- URL: http://arxiv.org/abs/2409.16042v2
- Date: Thu, 26 Sep 2024 13:18:24 GMT
- Title: Enhanced Unsupervised Image-to-Image Translation Using Contrastive Learning and Histogram of Oriented Gradients
- Authors: Wanchen Zhao,
- Abstract summary: This paper proposes an enhanced unsupervised image-to-image translation method based on the Contrastive Unpaired Translation (CUT) model.
This novel approach ensures the preservation of the semantic structure of images, even without semantic labels.
The method was tested on translating synthetic game environments from GTA5 dataset to realistic urban scenes in cityscapes dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image-to-Image Translation is a vital area of computer vision that focuses on transforming images from one visual domain to another while preserving their core content and structure. However, this field faces two major challenges: first, the data from the two domains are often unpaired, making it difficult to train generative adversarial networks effectively; second, existing methods tend to produce artifacts or hallucinations during image generation, leading to a decline in image quality. To address these issues, this paper proposes an enhanced unsupervised image-to-image translation method based on the Contrastive Unpaired Translation (CUT) model, incorporating Histogram of Oriented Gradients (HOG) features. This novel approach ensures the preservation of the semantic structure of images, even without semantic labels, by minimizing the loss between the HOG features of input and generated images. The method was tested on translating synthetic game environments from GTA5 dataset to realistic urban scenes in cityscapes dataset, demonstrating significant improvements in reducing hallucinations and enhancing image quality.
Related papers
- DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting [2.656795553429629]
We propose a dual affine transformation generative adversarial network (DAFT-GAN) to maintain the semantic consistency for text-guided inpainting.
Our proposed model outperforms the existing GAN-based models in both qualitative and quantitative assessments.
arXiv Detail & Related papers (2024-08-09T09:28:42Z) - The Right Losses for the Right Gains: Improving the Semantic Consistency
of Deep Text-to-Image Generation with Distribution-Sensitive Losses [0.35898124827270983]
We propose a contrastive learning approach with a novel combination of two loss functions: fake-to-fake loss and fake-to-real loss.
We test this approach on two baseline models: SSAGAN and AttnGAN.
Results show that our approach improves the qualitative results on AttnGAN with style blocks on the CUB dataset.
arXiv Detail & Related papers (2023-12-18T00:05:28Z) - FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph
Parsing [66.70054075041487]
Existing scene graphs that convert image captions into scene graphs often suffer from two types of errors.
First, the generated scene graphs fail to capture the true semantics of the captions or the corresponding images, resulting in a lack of faithfulness.
Second, the generated scene graphs have high inconsistency, with the same semantics represented by different annotations.
arXiv Detail & Related papers (2023-05-27T15:38:31Z) - Wavelet-based Unsupervised Label-to-Image Translation [9.339522647331334]
We propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination.
We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
arXiv Detail & Related papers (2023-05-16T17:48:44Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - IR-GAN: Image Manipulation with Linguistic Instruction by Increment
Reasoning [110.7118381246156]
Increment Reasoning Generative Adversarial Network (IR-GAN) aims to reason consistency between visual increment in images and semantic increment in instructions.
First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment.
Second, we embed the representation of semantic increment into that of source image for generating target image, where source image plays the role of referring auxiliary.
arXiv Detail & Related papers (2022-04-02T07:48:39Z) - USIS: Unsupervised Semantic Image Synthesis [9.613134538472801]
We propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS)
USIS learns to output images with visually separable semantic classes using a self-supervised segmentation loss.
In order to match the color and texture distribution of real images without losing high-frequency information, we propose to use whole image wavelet-based discrimination.
arXiv Detail & Related papers (2021-09-29T20:48:41Z) - Dual Graph Convolutional Networks with Transformer and Curriculum
Learning for Image Captioning [26.496357517937614]
Existing image captioning methods just focus on understanding the relationship between objects or instances in a single image.
We propose Dual Graph Convolutional Networks (Dual-GCN) with transformer and curriculum learning for image captioning.
arXiv Detail & Related papers (2021-08-05T04:57:06Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z) - Semantic Photo Manipulation with a Generative Image Prior [86.01714863596347]
GANs are able to synthesize images conditioned on inputs such as user sketch, text, or semantic labels.
It is hard for GANs to precisely reproduce an input image.
In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image.
Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image.
arXiv Detail & Related papers (2020-05-15T18:22:05Z) - Domain Adaptation for Image Dehazing [72.15994735131835]
Most existing methods train a dehazing model on synthetic hazy images, which are less able to generalize well to real hazy images due to domain shift.
We propose a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules.
Experimental results on both synthetic and real-world images demonstrate that our model performs favorably against the state-of-the-art dehazing algorithms.
arXiv Detail & Related papers (2020-05-10T13:54:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.