Related papers: Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge

Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge

URL: http://arxiv.org/abs/2602.16664v1
Date: Wed, 18 Feb 2026 18:05:00 GMT
Title: Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge
Authors: Jiaming Liu, Felix Petersen, Yunhe Gao, Yabin Zhang, Hyojin Kim, Akshay S. Chaudhari, Yu Sun, Stefano Ermon, Sergios Gatidis,
Abstract summary: Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations.<n>We propose the Self-Supervised Semantic Bridge ( SSB), a versatile framework that integrates external semantic priors into diffusion bridge models.<n>Our key idea is to leverage self-supervised visual encoders to learn representations that are invariant to appearance changes but capture geometric structure.
Score: 59.247871132422006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches require target-domain adversarial loss during training, which can limit generalization to unseen data, while diffusion-inversion methods often produce low-fidelity translations due to imperfect inversion into noise-latent representations. In this work, we propose the Self-Supervised Semantic Bridge (SSB), a versatile framework that integrates external semantic priors into diffusion bridge models to enable spatially faithful translation without cross-domain supervision. Our key idea is to leverage self-supervised visual encoders to learn representations that are invariant to appearance changes but capture geometric structure, forming a shared latent space that conditions the diffusion bridges. Extensive experiments show that SSB outperforms strong prior methods for challenging medical image synthesis in both in-domain and out-of-domain settings, and extends easily to high-quality text-guided editing.

Related papers

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection [29.203173410857914]
We propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation.<n>Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected.<n>Our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP.
arXiv Detail & Related papers (2026-02-16T12:08:37Z)
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge [16.958159611661813]
Latent Denoising Diffusion Bridge Model (LDDBM) is a general-purpose framework for modality translation.<n>By operating in a shared latent space, our method learns a bridge between arbitrary modalities without requiring aligned dimensions.<n>Our approach supports arbitrary modality pairs and performs strongly on diverse MT tasks, including multi-view to 3D shape generation, image super-resolution, and multi-view scene synthesis.
arXiv Detail & Related papers (2025-10-23T17:59:54Z)
Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score [4.8677910801584385]
Large-scale text-to-image generative models have shown remarkable ability to synthesize diverse and high-quality images.<n>We present Dual Contrastive Denoising Score, a framework that leverages the rich generative prior of text-to-image diffusion models.<n>Our method achieves both flexible content modification and structure preservation between input and output images, as well as zero-shot image-to-image translation.
arXiv Detail & Related papers (2025-08-18T08:30:07Z)
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z)
Contrastive Learning Guided Latent Diffusion Model for Image-to-Image Translation [7.218556478126324]
diffusion model has demonstrated superior performance in diverse and high-quality images for text-guided image translation.<n>We propose pix2pix-zeroCon, a zero-shot diffusion-based method that eliminates the need for additional training by leveraging patch-wise contrastive loss.<n>Our approach requires no additional training and operates directly on a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2025-03-26T12:15:25Z)
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack [51.16384207202798]
Vision-language pre-training models are vulnerable to multimodal adversarial examples (AEs) Previous approaches augment image-text pairs to enhance diversity within the adversarial example generation process. We propose sampling from adversarial evolution triangles composed of clean, historical, and current adversarial examples to enhance adversarial diversity.
arXiv Detail & Related papers (2024-11-04T23:07:51Z)
TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization [59.412236435627094]
TALE is a training-free framework harnessing the generative capabilities of text-to-image diffusion models. We equip TALE with two mechanisms dubbed Adaptive Latent Manipulation and Energy-guided Latent Optimization. Our experiments demonstrate that TALE surpasses prior baselines and attains state-of-the-art performance in image-guided composition.
arXiv Detail & Related papers (2024-08-07T08:52:21Z)
Diffusion-based Image Translation using Disentangled Style and Content Representation [51.188396199083336]
Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer. It is often difficult to maintain the original content of the image during the reverse diffusion. We present a novel diffusion-based unsupervised image translation method using disentangled style and content representation. Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.
arXiv Detail & Related papers (2022-09-30T06:44:37Z)
Marginal Contrastive Correspondence for Guided Image Generation [58.0605433671196]
Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar from two different domains. Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains. We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.
arXiv Detail & Related papers (2022-04-01T13:55:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.