VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection
- URL: http://arxiv.org/abs/2508.11167v2
- Date: Tue, 26 Aug 2025 08:55:26 GMT
- Title: VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection
- Authors: Jianhong Han, Yupei Wang, Liang Chen,
- Abstract summary: VG-DETR integrates a Vision Foundation Model (VFM) into the training pipeline in a "free lunch" manner.<n>We introduce a VFM-guided pseudo-label mining strategy that leverages the VFM's semantic priors to assess the reliability of the generated pseudo-labels.<n>In addition, a dual-level VFM-guided alignment method is proposed, which aligns detector features with VFM embeddings at both the instance and image levels.
- Score: 9.029534000674388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised domain adaptation methods have been widely explored to bridge domain gaps. However, in real-world remote-sensing scenarios, privacy and transmission constraints often preclude access to source domain data, which limits their practical applicability. Recently, Source-Free Object Detection (SFOD) has emerged as a promising alternative, aiming at cross-domain adaptation without relying on source data, primarily through a self-training paradigm. Despite its potential, SFOD frequently suffers from training collapse caused by noisy pseudo-labels, especially in remote sensing imagery with dense objects and complex backgrounds. Considering that limited target domain annotations are often feasible in practice, we propose a Vision foundation-Guided DEtection TRansformer (VG-DETR), built upon a semi-supervised framework for SFOD in remote sensing images. VG-DETR integrates a Vision Foundation Model (VFM) into the training pipeline in a "free lunch" manner, leveraging a small amount of labeled target data to mitigate pseudo-label noise while improving the detector's feature-extraction capability. Specifically, we introduce a VFM-guided pseudo-label mining strategy that leverages the VFM's semantic priors to further assess the reliability of the generated pseudo-labels. By recovering potentially correct predictions from low-confidence outputs, our strategy improves pseudo-label quality and quantity. In addition, a dual-level VFM-guided alignment method is proposed, which aligns detector features with VFM embeddings at both the instance and image levels. Through contrastive learning among fine-grained prototypes and similarity matching between feature maps, this dual-level alignment further enhances the robustness of feature representations against domain gaps. Extensive experiments demonstrate that VG-DETR achieves superior performance in source-free remote sensing detection tasks.
Related papers
- Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection [34.292554427633505]
Source-Free Object Detection (SFOD) aims to adapt a source-pretrained object detector to a target domain without access to source data.<n> Vision Foundation Models (VFMs), pretrained on massive and diverse data, exhibit strong perception capabilities and broad generalization.<n>We propose a novel SFOD framework that leverages VFMs as external knowledge sources to jointly enhance feature alignment and label quality.
arXiv Detail & Related papers (2025-11-10T17:06:01Z) - EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels [85.78886153628663]
Open-Set Domain Generalization aims to enable deep learning models to recognize unseen categories in new domains.<n>Label noise hinders open-set domain generalization by corrupting source-domain knowledge.<n>We propose Evidential Reliability-Aware Residual Flow Meta-Learning (EReLiFM) to bridge domain gaps.
arXiv Detail & Related papers (2025-10-14T16:23:11Z) - Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z) - Prototype-Based Pseudo-Label Denoising for Source-Free Domain Adaptation in Remote Sensing Semantic Segmentation [16.927392753457866]
Source-Free Domain Adaptation (SFDA) enables domain adaptation for semantic segmentation of Remote Sensing Images (RSIs) using only a well-trained source model and unlabeled target domain data.<n>We propose ProSFDA, a prototype-guided SFDA framework. It employs prototype-weighted pseudo-labels to facilitate reliable self-training (ST) under pseudo-labels noise.<n>We, in addition, introduce a prototype-contrast strategy that encourages the aggregation of features belonging to the same class, enabling the model to learn discriminative target domain representations without relying on ground-truth supervision.
arXiv Detail & Related papers (2025-09-21T06:33:59Z) - Style-Adaptive Detection Transformer for Single-Source Domain Generalized Object Detection [7.768332621617199]
Single-source domain generalization aims to develop a detector using only source domain data that generalizes well to unseen target domains.<n>Existing methods are primarily CNN-based and improve robustness through data augmentation combined with feature alignment.<n>We propose Style-Adaptive DEtection TRansformer (SA-DETR), a DETR-based detector tailored for single-source domain generalization.
arXiv Detail & Related papers (2025-04-29T07:38:37Z) - Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving [7.064497253920508]
Vision Foundation Models (VFMs) as feature extractors and density modeling techniques are proposed.<n>A comparison with state-of-the-art binary OOD classification methods reveals that VFM embeddings with density estimation outperform existing approaches in identifying OOD inputs.<n>Our method detects high-risk inputs likely to cause errors in downstream tasks, thereby improving overall performance.
arXiv Detail & Related papers (2025-01-14T12:51:34Z) - Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector [72.05791402494727]
This paper studies the challenging cross-domain few-shot object detection (CD-FSOD)
It aims to develop an accurate object detector for novel domains with minimal labeled examples.
arXiv Detail & Related papers (2024-02-05T15:25:32Z) - Source-free Domain Adaptive Object Detection in Remote Sensing Images [11.19538606490404]
We propose a source-free object detection (SFOD) setting for RS images.
It aims to perform target domain adaptation using only the source pre-trained model.
Our method does not require access to source domain RS images.
arXiv Detail & Related papers (2024-01-31T15:32:44Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Exploiting Low-confidence Pseudo-labels for Source-free Object Detection [54.98300313452037]
Source-free object detection (SFOD) aims to adapt a source-trained detector to an unlabeled target domain without access to the labeled source data.
Current SFOD methods utilize a threshold-based pseudo-label approach in the adaptation phase.
We propose a new approach to take full advantage of pseudo-labels by introducing high and low confidence thresholds.
arXiv Detail & Related papers (2023-10-19T12:59:55Z) - Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer [60.31021888394358]
Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR)
We propose a SOurce-free Domain Adaptation framework for image SR (SODA-SR) to address this issue, i.e., adapt a source-trained model to a target domain with only unlabeled target data.
arXiv Detail & Related papers (2023-03-31T03:14:44Z) - Adversarial Alignment for Source Free Object Detection [24.99432954279032]
Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data.
We divide the target domain into source-similar and source-dissimilar parts and align them in the feature space by adversarial learning.
Our proposed method consistently outperforms the compared SFOD methods.
arXiv Detail & Related papers (2023-01-11T02:08:37Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Towards Robust Adaptive Object Detection under Noisy Annotations [40.25050610617893]
Existing methods assume that the source domain labels are completely clean, yet large-scale datasets often contain error-prone annotations due to instance ambiguity.
We propose a Noise Latent Transferability Exploration framework to address this issue.
NLTE improves the mAP by 8.4% under 60% corrupted annotations and even approaches the ideal upper bound of training on a clean source dataset.
arXiv Detail & Related papers (2022-04-06T07:02:37Z) - Instance Relation Graph Guided Source-Free Domain Adaptive Object
Detection [79.89082006155135]
Unsupervised Domain Adaptation (UDA) is an effective approach to tackle the issue of domain shift.
UDA methods try to align the source and target representations to improve the generalization on the target domain.
The Source-Free Adaptation Domain (SFDA) setting aims to alleviate these concerns by adapting a source-trained model for the target domain without requiring access to the source data.
arXiv Detail & Related papers (2022-03-29T17:50:43Z) - Decompose to Adapt: Cross-domain Object Detection via Feature
Disentanglement [79.2994130944482]
We design a Domain Disentanglement Faster-RCNN (DDF) to eliminate the source-specific information in the features for detection task learning.
Our DDF method facilitates the feature disentanglement at the global and local stages, with a Global Triplet Disentanglement (GTD) module and an Instance Similarity Disentanglement (ISD) module.
By outperforming state-of-the-art methods on four benchmark UDA object detection tasks, our DDF method is demonstrated to be effective with wide applicability.
arXiv Detail & Related papers (2022-01-06T05:43:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.