Related papers: Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection

Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection

URL: http://arxiv.org/abs/2511.07301v1
Date: Mon, 10 Nov 2025 17:06:01 GMT
Title: Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection
Authors: Huizai Yao, Sicheng Zhao, Pengteng Li, Yi Cui, Shuo Lu, Weiyu Guo, Yunfan Lu, Yijie Xu, Hui Xiong,
Abstract summary: Source-Free Object Detection (SFOD) aims to adapt a source-pretrained object detector to a target domain without access to source data.<n> Vision Foundation Models (VFMs), pretrained on massive and diverse data, exhibit strong perception capabilities and broad generalization.<n>We propose a novel SFOD framework that leverages VFMs as external knowledge sources to jointly enhance feature alignment and label quality.
Score: 34.292554427633505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Source-Free Object Detection (SFOD) aims to adapt a source-pretrained object detector to a target domain without access to source data. However, existing SFOD methods predominantly rely on internal knowledge from the source model, which limits their capacity to generalize across domains and often results in biased pseudo-labels, thereby hindering both transferability and discriminability. In contrast, Vision Foundation Models (VFMs), pretrained on massive and diverse data, exhibit strong perception capabilities and broad generalization, yet their potential remains largely untapped in the SFOD setting. In this paper, we propose a novel SFOD framework that leverages VFMs as external knowledge sources to jointly enhance feature alignment and label quality. Specifically, we design three VFM-based modules: (1) Patch-weighted Global Feature Alignment (PGFA) distills global features from VFMs using patch-similarity-based weighting to enhance global feature transferability; (2) Prototype-based Instance Feature Alignment (PIFA) performs instance-level contrastive learning guided by momentum-updated VFM prototypes; and (3) Dual-source Enhanced Pseudo-label Fusion (DEPF) fuses predictions from detection VFMs and teacher models via an entropy-aware strategy to yield more reliable supervision. Extensive experiments on six benchmarks demonstrate that our method achieves state-of-the-art SFOD performance, validating the effectiveness of integrating VFMs to simultaneously improve transferability and discriminability.

Related papers

AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors [6.6016630449883955]
AnomalyVFM is a framework that turns any pretrained VFM into a strong zero-shot anomaly detector.<n>Our approach combines a robust three-stage synthetic dataset generation scheme with a parameter-efficient adaptation mechanism.<n>It achieves an average image-level AUROC of 94.1% across 9 diverse datasets, surpassing previous methods by significant 3.3 percentage points.
arXiv Detail & Related papers (2026-01-28T12:02:58Z)
Towards Unbiased Source-Free Object Detection via Vision Foundation Models [43.313980360639164]
Source-free Object Detection (SFOD) has garnered much attention in recent years by eliminating the need of source-domain data in cross-domain tasks.<n>Existing SFOD methods suffer from the Source Bias problem, leading to poor generalization and error accumulation during self-training.<n>We propose Debiased Source-free Object Detection (DSOD), a novel VFM-assisted SFOD framework that can effectively mitigate source bias with the help of powerful VFMs.
arXiv Detail & Related papers (2026-01-19T06:51:55Z)
Collaborative Learning with Multiple Foundation Models for Source-Free Domain Adaptation [9.231185930198162]
Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain without access to source data.<n>Recent advances in Foundation Models (FMs) have introduced new opportunities for leveraging external semantic knowledge to guide SFDA.
arXiv Detail & Related papers (2025-11-24T14:12:22Z)
Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z)
DAM: Dual Active Learning with Multimodal Foundation Model for Source-Free Domain Adaptation [53.323488295994395]
Source-free active domain adaptation (SFADA) enhances knowledge transfer from a source model to an unlabeled target domain using limited manual labels selected via active learning.<n>We propose Dual Active learning with Multimodal (DAM) foundation model, a novel framework that integrates multimodal supervision from a ViL model to complement sparse human annotations.<n>Extensive experiments demonstrate that DAM consistently outperforms existing methods and sets a new state-of-the-art across multiple SFADA benchmarks and active learning strategies.
arXiv Detail & Related papers (2025-09-29T15:06:56Z)
VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection [9.029534000674388]
VG-DETR integrates a Vision Foundation Model (VFM) into the training pipeline in a "free lunch" manner.<n>We introduce a VFM-guided pseudo-label mining strategy that leverages the VFM's semantic priors to assess the reliability of the generated pseudo-labels.<n>In addition, a dual-level VFM-guided alignment method is proposed, which aligns detector features with VFM embeddings at both the instance and image levels.
arXiv Detail & Related papers (2025-08-15T02:35:56Z)
AdaFusion: Prompt-Guided Inference with Adaptive Fusion of Pathology Foundation Models [49.550545038402184]
We propose AdaFusion, a novel prompt-guided inference framework.<n>Our method compresses and aligns tile-level features from diverse models.<n>AdaFusion consistently surpasses individual PFMs across both classification and regression tasks.
arXiv Detail & Related papers (2025-08-07T07:09:31Z)
Robust Federated Learning on Edge Devices with Domain Heterogeneity [13.362209980631876]
Federated Learning (FL) allows collaborative training while ensuring data privacy across distributed edge devices.<n>We introduce a new framework to address this challenge by improving the generalization ability of the FL global model.<n>We introduce FedAPC, a prototype-based FL framework designed to enhance feature diversity and model robustness.
arXiv Detail & Related papers (2025-05-15T09:53:14Z)
Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving [7.064497253920508]
Vision Foundation Models (VFMs) as feature extractors and density modeling techniques are proposed.<n>A comparison with state-of-the-art binary OOD classification methods reveals that VFM embeddings with density estimation outperform existing approaches in identifying OOD inputs.<n>Our method detects high-risk inputs likely to cause errors in downstream tasks, thereby improving overall performance.
arXiv Detail & Related papers (2025-01-14T12:51:34Z)
Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks. We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability. Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z)
Consistency Regularization for Generalizable Source-free Domain Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset. Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets. We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z)
Reliable Federated Disentangling Network for Non-IID Domain Feature [62.73267904147804]
In this paper, we propose a novel reliable federated disentangling network, termed RFedDis. To the best of our knowledge, our proposed RFedDis is the first work to develop an FL approach based on evidential uncertainty combined with feature disentangling. Our proposed RFedDis provides outstanding performance with a high degree of reliability as compared to other state-of-the-art FL approaches.
arXiv Detail & Related papers (2023-01-30T11:46:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.