OT-ALD: Aligning Latent Distributions with Optimal Transport for Accelerated Image-to-Image Translation
- URL: http://arxiv.org/abs/2511.11162v1
- Date: Fri, 14 Nov 2025 10:57:21 GMT
- Title: OT-ALD: Aligning Latent Distributions with Optimal Transport for Accelerated Image-to-Image Translation
- Authors: Zhanpeng Wang, Shuting Cao, Yuhang Lu, Yuhan Li, Na Lei, Zhongxuan Luo,
- Abstract summary: The Dual Diffusion Implicit Bridge (DDIB) is an emerging image-to-image (I2I) translation method that preserves cycle consistency while achieving strong flexibility.<n>We propose a novel I2I translation framework, OT-ALD, grounded in optimal transport theory.<n>We show that OT-ALD improves sampling efficiency by 20.29% and reduces the FID score by 2.6 on average compared to the top-performing baseline models.
- Score: 23.752936213193376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Dual Diffusion Implicit Bridge (DDIB) is an emerging image-to-image (I2I) translation method that preserves cycle consistency while achieving strong flexibility. It links two independently trained diffusion models (DMs) in the source and target domains by first adding noise to a source image to obtain a latent code, then denoising it in the target domain to generate the translated image. However, this method faces two key challenges: (1) low translation efficiency, and (2) translation trajectory deviations caused by mismatched latent distributions. To address these issues, we propose a novel I2I translation framework, OT-ALD, grounded in optimal transport (OT) theory, which retains the strengths of DDIB-based approach. Specifically, we compute an OT map from the latent distribution of the source domain to that of the target domain, and use the mapped distribution as the starting point for the reverse diffusion process in the target domain. Our error analysis confirms that OT-ALD eliminates latent distribution mismatches. Moreover, OT-ALD effectively balances faster image translation with improved image quality. Experiments on four translation tasks across three high-resolution datasets show that OT-ALD improves sampling efficiency by 20.29% and reduces the FID score by 2.6 on average compared to the top-performing baseline models.
Related papers
- Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN [7.610968152027164]
Fd-CycleGAN is an image-to-image (I2I) translation framework that enhances latent representation learning to approximate real data distributions.<n>We conduct experiments on diverse datasets -- Horse2Zebra, Monet2Photo, and a synthetically augmented Strike-off dataset.<n>Our results suggest that frequency-guided latent learning significantly improves generalization in image translation tasks.
arXiv Detail & Related papers (2025-08-05T12:59:37Z) - Single-Step Bidirectional Unpaired Image Translation Using Implicit Bridge Consistency Distillation [55.45188329646137]
Implicit Bridge Consistency Distillation (IBCD) enables single-step bidirectional unpaired translation without using adversarial loss.<n>IBCD achieves state-of-the-art performance on benchmark datasets in a single generation step.
arXiv Detail & Related papers (2025-03-19T09:48:04Z) - A Diffusion Model Translator for Efficient Image-to-Image Translation [60.86381807306705]
We propose an efficient method that equips a diffusion model with a lightweight translator, dubbed a Diffusion Model Translator (DMT)<n>We evaluate our approach on a range of I2I applications, including image stylization, image colorization, segmentation to image, and sketch to image, to validate its efficacy and general utility.
arXiv Detail & Related papers (2025-02-01T04:01:24Z) - Rethinking Score Distillation as a Bridge Between Image Distributions [97.27476302077545]
We show that our method seeks to transport corrupted images (source) to the natural image distribution (target)<n>Our method can be easily applied across many domains, matching or beating the performance of specialized methods.<n>We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real.
arXiv Detail & Related papers (2024-06-13T17:59:58Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models [50.39417112077254]
A novel image-to-image translation method based on the Brownian Bridge Diffusion Model (BBDM) is proposed.
To the best of our knowledge, it is the first work that proposes Brownian Bridge diffusion process for image-to-image translation.
arXiv Detail & Related papers (2022-05-16T13:47:02Z) - Dual Diffusion Implicit Bridges for Image-to-Image Translation [104.59371476415566]
Common image-to-image translation methods rely on joint training over data from both source and target domains.
We present Dual Diffusion Implicit Bridges (DDIBs), an image translation method based on diffusion models.
DDIBs allow translations between arbitrary pairs of source-target domains, given independently trained diffusion models on respective domains.
arXiv Detail & Related papers (2022-03-16T04:10:45Z) - Beyond Deterministic Translation for Unsupervised Domain Adaptation [19.358300726820943]
In this work we challenge the common approach of using a one-to-one mapping (''translation'') between the source and target domains in unsupervised domain adaptation (UDA)
Instead, we rely on translation to capture inherent ambiguities between the source and target domains.
We report improvements over strong recent baselines, leading to state-of-the-art UDA results on two challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-02-15T23:03:33Z) - GAIT: Gradient Adjusted Unsupervised Image-to-Image Translation [5.076419064097734]
An adversarial loss is utilized to match the distributions of the translated and target image sets.
This may create artifacts if two domains have different marginal distributions, for example, in uniform areas.
We propose an unsupervised IIT that preserves the uniform regions after the translation.
arXiv Detail & Related papers (2020-09-02T08:04:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.