DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval
- URL: http://arxiv.org/abs/2509.04193v1
- Date: Thu, 04 Sep 2025 13:15:16 GMT
- Title: DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval
- Authors: Ruohong Yang, Peng Hu, Yunfan Li, Xi Peng,
- Abstract summary: Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images of the same category across diverse domains without relying on annotations.<n>Existing UCIR methods, which align cross-domain features for the entire image, often struggle with the domain gap.<n>We propose DUDE, a novel UCIR method building upon feature disentanglement.
- Score: 25.89035776794712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images of the same category across diverse domains without relying on annotations. Existing UCIR methods, which align cross-domain features for the entire image, often struggle with the domain gap, as the object features critical for retrieval are frequently entangled with domain-specific styles. To address this challenge, we propose DUDE, a novel UCIR method building upon feature disentanglement. In brief, DUDE leverages a text-to-image generative model to disentangle object features from domain-specific styles, thus facilitating semantical image retrieval. To further achieve reliable alignment of the disentangled object features, DUDE aligns mutual neighbors from within domains to across domains in a progressive manner. Extensive experiments demonstrate that DUDE achieves state-of-the-art performance across three benchmark datasets over 13 domains. The code will be released.
Related papers
- FPL+: Filtered Pseudo Label-based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation [14.925162565630185]
We propose an enhanced Filtered Pseudo Label (FPL+)-based Unsupervised Domain Adaptation (UDA) method for 3D medical image segmentation.
It first uses cross-domain data augmentation to translate labeled images in the source domain to a dual-domain training set consisting of a pseudo source-domain set and a pseudo target-domain set.
We then combine labeled source-domain images and target-domain images with pseudo labels to train a final segmentor, where image-level weighting based on uncertainty estimation and pixel-level weighting based on dual-domain consensus are proposed to mitigate the adverse effect of noisy pseudo
arXiv Detail & Related papers (2024-04-07T14:21:37Z) - A Multimodal Approach for Cross-Domain Image Retrieval [5.5547914920738]
Cross-Domain Image Retrieval (CDIR) is a challenging task in computer vision.
This paper introduces a novel unsupervised approach to CDIR that incorporates textual context by leveraging pre-trained vision-language models.
Our method, dubbed as Caption-Matching (CM), uses generated image captions as a domain-agnostic intermediate representation.
arXiv Detail & Related papers (2024-03-22T12:08:16Z) - Domain-Scalable Unpaired Image Translation via Latent Space Anchoring [88.7642967393508]
Unpaired image-to-image translation (UNIT) aims to map images between two visual domains without paired training data.
We propose a new domain-scalable UNIT method, termed as latent space anchoring.
Our method anchors images of different domains to the same latent space of frozen GANs by learning lightweight encoder and regressor models.
In the inference phase, the learned encoders and decoders of different domains can be arbitrarily combined to translate images between any two domains without fine-tuning.
arXiv Detail & Related papers (2023-06-26T17:50:02Z) - Correspondence-Free Domain Alignment for Unsupervised Cross-Domain Image
Retrieval [25.43019715242141]
Cross-domain image retrieval aims at retrieving images across different domains to excavate cross-domain classificatory or correspondence relationships.
It is challenging to align and bridge distinct domains without cross-domain correspondence.
We present a novel Correspondence Domain-free Alignment (CoDA) method to eliminate the cross-domain gap.
Our method could encode the discrimination into the domain-invariant embedding space for unsupervised cross-domain image retrieval.
arXiv Detail & Related papers (2023-02-13T03:34:49Z) - I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic
Segmentation [55.633859439375044]
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work.
Key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly.
This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation.
arXiv Detail & Related papers (2023-01-03T15:19:48Z) - Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval [55.122020263319634]
Video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query.
In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain but the domain of interest only contains unannotated datasets.
We propose a novel Multi-Modal Cross-Domain Alignment network to transfer the annotation knowledge from the source domain to the target domain.
arXiv Detail & Related papers (2022-09-23T12:58:20Z) - Unsupervised Domain Generalization by Learning a Bridge Across Domains [78.855606355957]
Unsupervised Domain Generalization (UDG) setup has no training supervision in neither source nor target domains.
Our approach is based on self-supervised learning of a Bridge Across Domains (BrAD) - an auxiliary bridge domain accompanied by a set of semantics preserving visual (image-to-image) mappings to BrAD from each of the training domains.
We show how using an edge-regularized BrAD our approach achieves significant gains across multiple benchmarks and a range of tasks, including UDG, Few-shot UDA, and unsupervised generalization across multi-domain datasets.
arXiv Detail & Related papers (2021-12-04T10:25:45Z) - Disentangled Unsupervised Image Translation via Restricted Information
Flow [61.44666983942965]
Many state-of-art methods hard-code the desired shared-vs-specific split into their architecture.
We propose a new method that does not rely on inductive architectural biases.
We show that the proposed method achieves consistently high manipulation accuracy across two synthetic and one natural dataset.
arXiv Detail & Related papers (2021-11-26T00:27:54Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - Variational Interaction Information Maximization for Cross-domain
Disentanglement [34.08140408283391]
Cross-domain disentanglement is the problem of learning representations partitioned into domain-invariant and domain-specific representations.
We cast the simultaneous learning of domain-invariant and domain-specific representations as a joint objective of multiple information constraints.
We show that our model achieves the state-of-the-art performance in the zero-shot sketch based image retrieval task.
arXiv Detail & Related papers (2020-12-08T07:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.