Related papers: Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation

Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation

URL: http://arxiv.org/abs/2505.16360v1
Date: Thu, 22 May 2025 08:11:10 GMT
Title: Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation
Authors: Estelle Chigot, Dennis G. Wilson, Meriem Ghrib, Thomas Oberlin,
Abstract summary: We introduce two novel techniques for semantically consistent style transfer using diffusion models.<n>Experiments using GTA5 as source and Cityscapes/ACDC as target domains show that our approach produces higher quality images with lower FID scores and better content preservation.
Score: 4.50001192781448
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Semantic segmentation models trained on synthetic data often perform poorly on real-world images due to domain gaps, particularly in adverse conditions where labeled data is scarce. Yet, recent foundation models enable to generate realistic images without any training. This paper proposes to leverage such diffusion models to improve the performance of vision models when learned on synthetic data. We introduce two novel techniques for semantically consistent style transfer using diffusion models: Class-wise Adaptive Instance Normalization and Cross-Attention (CACTI) and its extension with selective attention Filtering (CACTIF). CACTI applies statistical normalization selectively based on semantic classes, while CACTIF further filters cross-attention maps based on feature similarity, preventing artifacts in regions with weak cross-attention correspondences. Our methods transfer style characteristics while preserving semantic boundaries and structural coherence, unlike approaches that apply global transformations or generate content without constraints. Experiments using GTA5 as source and Cityscapes/ACDC as target domains show that our approach produces higher quality images with lower FID scores and better content preservation. Our work demonstrates that class-aware diffusion-based style transfer effectively bridges the synthetic-to-real domain gap even with minimal target domain data, advancing robust perception systems for challenging real-world applications. The source code is available at: https://github.com/echigot/cactif.

Related papers

Coarse-to-Fine Hierarchical Alignment for UAV-based Human Detection using Diffusion Models [14.696438400081114]
We introduce a three-stage diffusion-based framework designed to transform synthetic data for UAV-based human detection.<n>Cwd explicitly decouples global style and local content domain discrepancies and bridges those gaps using three modules.<n>Our method achieves up to $+14.1$ improvement of mAP50 on Semantic-Drone benchmark.
arXiv Detail & Related papers (2025-12-15T19:57:36Z)
Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z)
Adversarially Domain-adaptive Latent Diffusion for Unsupervised Semantic Segmentation [7.099012213719071]
This work introduces a semantic segmentation method based on latent diffusion models, termed Inter-Coder Connected Latent Diffusion (ICCLD)<n>ICCLD outperforms state-of-the-art UDA methods, achieving mIoU scores of 74.4 (GTA5$rightarrow$Cityscapes) and 67.2 ( Synthia$rightarrow$Cityscapes)
arXiv Detail & Related papers (2024-12-22T04:55:41Z)
Diffusion Features to Bridge Domain Gap for Semantic Segmentation [2.8616666231199424]
This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently. By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
arXiv Detail & Related papers (2024-06-02T15:33:46Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL) Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z)
Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations [61.132408427908175]
zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain. With only a single representative text feature instead of real images, the synthesized images gradually lose diversity. We propose a novel method to find semantic variations of the target text in the CLIP space.
arXiv Detail & Related papers (2023-08-21T08:12:28Z)
Adaptive Semantic Consistency for Cross-domain Few-shot Classification [27.176106714652327]
Cross-domain few-shot classification (CD-FSC) aims to identify novel target classes with a few samples. We propose a simple plug-and-play Adaptive Semantic Consistency framework, which improves cross-domain robustness. The proposed ASC enables explicit transfer of source domain knowledge to prevent the model from overfitting the target domain.
arXiv Detail & Related papers (2023-08-01T15:37:19Z)
Few-shot Semantic Image Synthesis with Class Affinity Transfer [23.471210664024067]
We propose a transfer method that leverages a model trained on a large source dataset to improve the learning ability on small target datasets. The class affinity matrix is introduced as a first layer to the source model to make it compatible with the target label maps. We apply our approach to GAN-based and diffusion-based architectures for semantic synthesis.
arXiv Detail & Related papers (2023-04-05T09:24:45Z)
One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models [15.590759602379517]
Adapting a segmentation model from a labeled source domain to a target domain is one of the most challenging problems in domain adaptation. We leverage text-to-image diffusion models to generate a synthetic target dataset with photo-realistic images. Experiments show that our method surpasses the state-of-the-art OSUDA methods by up to +7.1%.
arXiv Detail & Related papers (2023-03-31T14:16:38Z)
Continual Unsupervised Domain Adaptation for Semantic Segmentation using a Class-Specific Transfer [9.46677024179954]
segmentation models do not generalize to unseen domains. We propose a light-weight style transfer framework that incorporates two class-conditional AdaIN layers. We extensively validate our approach on a synthetic sequence and further propose a challenging sequence consisting of real domains.
arXiv Detail & Related papers (2022-08-12T21:30:49Z)
Imposing Consistency for Optical Flow Estimation [73.53204596544472]
Imposing consistency through proxy tasks has been shown to enhance data-driven learning. This paper introduces novel and effective consistency strategies for optical flow estimation.
arXiv Detail & Related papers (2022-04-14T22:58:30Z)
Semi-Supervised Domain Adaptation with Prototypical Alignment and Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled. To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks. Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.