Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection
- URL: http://arxiv.org/abs/2506.05872v1
- Date: Fri, 06 Jun 2025 08:41:09 GMT
- Title: Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection
- Authors: Yu Li, Xingyu Qiu, Yuqian Fu, Jie Chen, Tianwen Qian, Xu Zheng, Danda Pani Paudel, Yanwei Fu, Xuanjing Huang, Luc Van Gool, Yu-Gang Jiang,
- Abstract summary: Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to detect novel objects with only a handful of labeled samples from previously unseen domains.<n>Data augmentation and generative methods have shown promise in few-shot learning, but their effectiveness for CD-FSOD remains unclear.<n>We propose Domain-RAG, a training-free, retrieval-guided compositional image generation framework tailored for CD-FSOD.
- Score: 132.63712430690856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to detect novel objects with only a handful of labeled samples from previously unseen domains. While data augmentation and generative methods have shown promise in few-shot learning, their effectiveness for CD-FSOD remains unclear due to the need for both visual realism and domain alignment. Existing strategies, such as copy-paste augmentation and text-to-image generation, often fail to preserve the correct object category or produce backgrounds coherent with the target domain, making them non-trivial to apply directly to CD-FSOD. To address these challenges, we propose Domain-RAG, a training-free, retrieval-guided compositional image generation framework tailored for CD-FSOD. Domain-RAG consists of three stages: domain-aware background retrieval, domain-guided background generation, and foreground-background composition. Specifically, the input image is first decomposed into foreground and background regions. We then retrieve semantically and stylistically similar images to guide a generative model in synthesizing a new background, conditioned on both the original and retrieved contexts. Finally, the preserved foreground is composed with the newly generated domain-aligned background to form the generated image. Without requiring any additional supervision or training, Domain-RAG produces high-quality, domain-consistent samples across diverse tasks, including CD-FSOD, remote sensing FSOD, and camouflaged FSOD. Extensive experiments show consistent improvements over strong baselines and establish new state-of-the-art results. Codes will be released upon acceptance.
Related papers
- FPL+: Filtered Pseudo Label-based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation [14.925162565630185]
We propose an enhanced Filtered Pseudo Label (FPL+)-based Unsupervised Domain Adaptation (UDA) method for 3D medical image segmentation.
It first uses cross-domain data augmentation to translate labeled images in the source domain to a dual-domain training set consisting of a pseudo source-domain set and a pseudo target-domain set.
We then combine labeled source-domain images and target-domain images with pseudo labels to train a final segmentor, where image-level weighting based on uncertainty estimation and pixel-level weighting based on dual-domain consensus are proposed to mitigate the adverse effect of noisy pseudo
arXiv Detail & Related papers (2024-04-07T14:21:37Z) - Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector [72.05791402494727]
This paper studies the challenging cross-domain few-shot object detection (CD-FSOD)
It aims to develop an accurate object detector for novel domains with minimal labeled examples.
arXiv Detail & Related papers (2024-02-05T15:25:32Z) - Source-free Domain Adaptive Object Detection in Remote Sensing Images [11.19538606490404]
We propose a source-free object detection (SFOD) setting for RS images.
It aims to perform target domain adaptation using only the source pre-trained model.
Our method does not require access to source domain RS images.
arXiv Detail & Related papers (2024-01-31T15:32:44Z) - Rethinking Cross-Domain Pedestrian Detection: A Background-Focused
Distribution Alignment Framework for Instance-Free One-Stage Detectors [25.967313282860626]
Cross-domain pedestrian detection aims to generalize pedestrian detectors from one label-rich domain to another label-scarce domain.
This paper proposes a novel framework, namely, background-focused distribution alignment (BFDA), to train domain adaptive onestage pedestrian detectors.
arXiv Detail & Related papers (2023-09-15T21:29:27Z) - One-shot Unsupervised Domain Adaptation with Personalized Diffusion
Models [15.590759602379517]
Adapting a segmentation model from a labeled source domain to a target domain is one of the most challenging problems in domain adaptation.
We leverage text-to-image diffusion models to generate a synthetic target dataset with photo-realistic images.
Experiments show that our method surpasses the state-of-the-art OSUDA methods by up to +7.1%.
arXiv Detail & Related papers (2023-03-31T14:16:38Z) - Towards Diverse and Faithful One-shot Adaption of Generative Adversarial
Networks [54.80435295622583]
One-shot generative domain adaption aims to transfer a pre-trained generator on one domain to a new domain using one reference image only.
We present a novel one-shot generative domain adaption method, i.e., DiFa, for diverse generation and faithful adaptation.
arXiv Detail & Related papers (2022-07-18T16:29:41Z) - DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic
Segmentation [97.74059510314554]
Unsupervised domain adaptation (UDA) for semantic segmentation aims to adapt a segmentation model trained on the labeled source domain to the unlabeled target domain.
Existing methods try to learn domain invariant features while suffering from large domain gaps.
We propose a novel Dual Soft-Paste (DSP) method in this paper.
arXiv Detail & Related papers (2021-07-20T16:22:40Z) - RPCL: A Framework for Improving Cross-Domain Detection with Auxiliary
Tasks [74.10747285807315]
Cross-Domain Detection (XDD) aims to train an object detector using labeled image from a source domain but have good performance in the target domain with only unlabeled images.
This paper provides a complementary solution to align domains by learning the same auxiliary tasks in both domains simultaneously.
arXiv Detail & Related papers (2021-04-18T02:56:19Z) - Super-Resolving Cross-Domain Face Miniatures by Peeking at One-Shot
Exemplar [42.78574493628936]
We develop a Domain-Aware Pyramid-based Face Super-Resolution network, named DAP-FSR network.
Our DAP-FSR is the first attempt to super-resolve LR faces from a target domain by exploiting only a pair of high-resolution (HR) and LR exemplars in the target domain.
By iteratively updating the latent representations and our decoder, our DAP-FSR will be adapted to the target domain.
arXiv Detail & Related papers (2021-03-16T05:47:26Z) - DoFE: Domain-oriented Feature Embedding for Generalizable Fundus Image
Segmentation on Unseen Datasets [96.92018649136217]
We present a novel Domain-oriented Feature Embedding (DoFE) framework to improve the generalization ability of CNNs on unseen target domains.
Our DoFE framework dynamically enriches the image features with additional domain prior knowledge learned from multi-source domains.
Our framework generates satisfying segmentation results on unseen datasets and surpasses other domain generalization and network regularization methods.
arXiv Detail & Related papers (2020-10-13T07:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.