Related papers: Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN

Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN

URL: http://arxiv.org/abs/2508.03415v1
Date: Tue, 05 Aug 2025 12:59:37 GMT
Title: Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN
Authors: Shivangi Nigam, Adarsh Prasad Behera, Shekhar Verma, P. Nagabhushan,
Abstract summary: Fd-CycleGAN is an image-to-image (I2I) translation framework that enhances latent representation learning to approximate real data distributions.<n>We conduct experiments on diverse datasets -- Horse2Zebra, Monet2Photo, and a synthetically augmented Strike-off dataset.<n>Our results suggest that frequency-guided latent learning significantly improves generalization in image translation tasks.
Score: 7.610968152027164
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper presents Fd-CycleGAN, an image-to-image (I2I) translation framework that enhances latent representation learning to approximate real data distributions. Building upon the foundation of CycleGAN, our approach integrates Local Neighborhood Encoding (LNE) and frequency-aware supervision to capture fine-grained local pixel semantics while preserving structural coherence from the source domain. We employ distribution-based loss metrics, including KL/JS divergence and log-based similarity measures, to explicitly quantify the alignment between real and generated image distributions in both spatial and frequency domains. To validate the efficacy of Fd-CycleGAN, we conduct experiments on diverse datasets -- Horse2Zebra, Monet2Photo, and a synthetically augmented Strike-off dataset. Compared to baseline CycleGAN and other state-of-the-art methods, our approach demonstrates superior perceptual quality, faster convergence, and improved mode diversity, particularly in low-data regimes. By effectively capturing local and global distribution characteristics, Fd-CycleGAN achieves more visually coherent and semantically consistent translations. Our results suggest that frequency-guided latent learning significantly improves generalization in image translation tasks, with promising applications in document restoration, artistic style transfer, and medical image synthesis. We also provide comparative insights with diffusion-based generative models, highlighting the advantages of our lightweight adversarial approach in terms of training efficiency and qualitative output.

Related papers

Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning [2.9603070411207644]
Diffusion Transformers (DiT) is a diffusion-based framework for image-to-image translation.<n>DiT combines the denoising capabilities of diffusion models with the global modeling power of transformers.<n>We validate our approach on two benchmark datasets: face2comics, which translates real human faces to comic-style illustrations, and edges2shoes, which translates edge maps to realistic shoe images.
arXiv Detail & Related papers (2025-05-21T20:37:33Z)
Robust Visual Representation Learning with Multi-modal Prior Knowledge for Image Classification Under Distribution Shift [29.954639194410586]
We propose Knowledge-Guided Visual representation learning (KGV) to improve generalization under distribution shift.<n>It integrates knowledge from two distinct modalities: 1) a knowledge graph (KG) with hierarchical and association relationships; and 2) generated synthetic images of visual elements semantically represented in the KG.<n>The results demonstrate that KGV consistently exhibits higher accuracy and data efficiency across all experiments.
arXiv Detail & Related papers (2024-10-21T13:06:38Z)
Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss. Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z)
Spectral Normalization and Dual Contrastive Regularization for Image-to-Image Translation [9.029227024451506]
We propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization. We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks.
arXiv Detail & Related papers (2023-04-22T05:22:24Z)
fRegGAN with K-space Loss Regularization for Medical Image Translation [42.253647362909476]
Generative adversarial networks (GANs) have shown remarkable success in generating realistic images. GANs tend to suffer from a frequency bias towards low frequencies, which can lead to the removal of important structures in the generated images. We propose a novel frequency-aware image-to-image translation framework based on the supervised RegGAN approach, which we call fRegGAN.
arXiv Detail & Related papers (2023-03-28T12:49:10Z)
Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network. It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space. By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z)
Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.<n>Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.<n>We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation [56.55178339375146]
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic results. We propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space.
arXiv Detail & Related papers (2021-06-16T17:58:21Z)
Similarity Reasoning and Filtration for Image-Text Matching [85.68854427456249]
We propose a novel Similarity Graph Reasoning and Attention filtration network for image-text matching. Similarity Graph Reasoning (SGR) module relying on one graph convolutional neural network is introduced to infer relation-aware similarities with both the local and global alignments. We demonstrate the superiority of the proposed method with achieving state-of-the-art performances on the Flickr30K and MSCOCO datasets.
arXiv Detail & Related papers (2021-01-05T06:29:35Z)
Multimodal Image-to-Image Translation via Mutual Information Estimation and Maximization [16.54980086211836]
Multimodal image-to-image translation (I2IT) aims to learn a conditional distribution that explores multiple possible images in the target domain given an input image in the source domain. Conditional generative adversarial networks (cGANs) are often adopted for modeling such a conditional distribution. We propose a method that explicitly estimates and maximizes the mutual information between the latent code and the output image in cGANs.
arXiv Detail & Related papers (2020-08-08T14:09:23Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.