CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation
- URL: http://arxiv.org/abs/2508.06625v1
- Date: Fri, 08 Aug 2025 18:13:56 GMT
- Title: CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation
- Authors: Shilong Zou, Yuhang Huang, Renjiao Yi, Chenyang Zhu, Kai Xu,
- Abstract summary: We introduce a diffusion-based cross-domain image translator in the absence of paired training data.<n>We propose a novel joint learning framework that aligns the diffusion and the translation process.<n>Our method enables global optimization of both processes, enhancing the optimality and achieving improved fidelity and structural consistency.
- Score: 13.495259208378524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a diffusion-based cross-domain image translator in the absence of paired training data. Unlike GAN-based methods, our approach integrates diffusion models to learn the image translation process, allowing for more coverable modeling of the data distribution and performance improvement of the cross-domain translation. However, incorporating the translation process within the diffusion process is still challenging since the two processes are not aligned exactly, i.e., the diffusion process is applied to the noisy signal while the translation process is conducted on the clean signal. As a result, recent diffusion-based studies employ separate training or shallow integration to learn the two processes, yet this may cause the local minimal of the translation optimization, constraining the effectiveness of diffusion models. To address the problem, we propose a novel joint learning framework that aligns the diffusion and the translation process, thereby improving the global optimality. Specifically, we propose to extract the image components with diffusion models to represent the clean signal and employ the translation process with the image components, enabling an end-to-end joint learning manner. On the other hand, we introduce a time-dependent translation network to learn the complex translation mapping, resulting in effective translation learning and significant performance improvement. Benefiting from the design of joint learning, our method enables global optimization of both processes, enhancing the optimality and achieving improved fidelity and structural consistency. We have conducted extensive experiments on RGB$\leftrightarrow$RGB and diverse cross-modality translation tasks including RGB$\leftrightarrow$Edge, RGB$\leftrightarrow$Semantics and RGB$\leftrightarrow$Depth, showcasing better generative performances than the state of the arts.
Related papers
- Plasticine: A Traceable Diffusion Model for Medical Image Translation [79.39689106440389]
We propose Plasticine, to the best of our knowledge, the first end-to-end image-to-image translation framework explicitly designed with traceability as a core objective.<n>Our method combines intensity translation and spatial transformation within a denoising diffusion framework.<n>This design enables the generation of synthetic images with interpretable intensity transitions and spatially coherent deformations, supporting pixel-wise traceability throughout the translation process.
arXiv Detail & Related papers (2025-12-20T18:01:57Z) - SAGA: Learning Signal-Aligned Distributions for Improved Text-to-Image Generation [9.212970624261272]
State-of-the-art text-to-image models produce visually impressive results but often struggle with precise alignment to text prompts.<n>We propose a novel approach that learns a high-success-rate distribution conditioned on a target prompt.<n>Our method explicitly models the signal component during the denoising process, offering fine-grained control that mitigates over-optimization.
arXiv Detail & Related papers (2025-08-19T14:31:15Z) - Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN [7.610968152027164]
Fd-CycleGAN is an image-to-image (I2I) translation framework that enhances latent representation learning to approximate real data distributions.<n>We conduct experiments on diverse datasets -- Horse2Zebra, Monet2Photo, and a synthetically augmented Strike-off dataset.<n>Our results suggest that frequency-guided latent learning significantly improves generalization in image translation tasks.
arXiv Detail & Related papers (2025-08-05T12:59:37Z) - Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning [2.9603070411207644]
Diffusion Transformers (DiT) is a diffusion-based framework for image-to-image translation.<n>DiT combines the denoising capabilities of diffusion models with the global modeling power of transformers.<n>We validate our approach on two benchmark datasets: face2comics, which translates real human faces to comic-style illustrations, and edges2shoes, which translates edge maps to realistic shoe images.
arXiv Detail & Related papers (2025-05-21T20:37:33Z) - A Diffusion Model Translator for Efficient Image-to-Image Translation [60.86381807306705]
We propose an efficient method that equips a diffusion model with a lightweight translator, dubbed a Diffusion Model Translator (DMT)<n>We evaluate our approach on a range of I2I applications, including image stylization, image colorization, segmentation to image, and sketch to image, to validate its efficacy and general utility.
arXiv Detail & Related papers (2025-02-01T04:01:24Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image
Translation [29.03892463588357]
We present a novel method for exemplar-based image translation, called matching interleaved diffusion models (MIDMs)
We formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space.
To improve the reliability of the diffusion process, we design a confidence-aware process using cycle-consistency to consider only confident regions.
arXiv Detail & Related papers (2022-09-22T14:43:52Z) - BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models [50.39417112077254]
A novel image-to-image translation method based on the Brownian Bridge Diffusion Model (BBDM) is proposed.
To the best of our knowledge, it is the first work that proposes Brownian Bridge diffusion process for image-to-image translation.
arXiv Detail & Related papers (2022-05-16T13:47:02Z) - Smoothing the Disentangled Latent Style Space for Unsupervised
Image-to-Image Translation [56.55178339375146]
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic results.
We propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space.
arXiv Detail & Related papers (2021-06-16T17:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.