Related papers: Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection

Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection

URL: http://arxiv.org/abs/2512.17514v2
Date: Wed, 24 Dec 2025 07:10:56 GMT
Title: Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection
Authors: Sairam VCR, Rishabh Lalla, Aveen Dayal, Tejal Kulkarni, Anuj Lalla, Vineeth N Balasubramanian, Muhammad Haris Khan,
Abstract summary: Domain shift reduces the detector's ability to maintain strong object-focused representations.<n>FALCON-SFOD is a framework designed to enhance object-focused adaptation under domain shift.
Score: 38.14795337940857
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current state-of-the-art approaches in Source-Free Object Detection (SFOD) typically rely on Mean-Teacher self-labeling. However, domain shift often reduces the detector's ability to maintain strong object-focused representations, causing high-confidence activations over background clutter. This weak object focus results in unreliable pseudo-labels from the detection head. While prior works mainly refine these pseudo-labels, they overlook the underlying need to strengthen the feature space itself. We propose FALCON-SFOD (Foundation-Aligned Learning with Clutter suppression and Noise robustness), a framework designed to enhance object-focused adaptation under domain shift. It consists of two complementary components. SPAR (Spatial Prior-Aware Regularization) leverages the generalization strength of vision foundation models to regularize the detector's feature space. Using class-agnostic binary masks derived from OV-SAM, SPAR promotes structured and foreground-focused activations by guiding the network toward object regions. IRPL (Imbalance-aware Noise Robust Pseudo-Labeling) complements SPAR by promoting balanced and noise-tolerant learning under severe foreground-background imbalance. Guided by a theoretical analysis that connects these designs to tighter localization and classification error bounds, FALCON-SFOD achieves competitive performance across SFOD benchmarks.

Related papers

Breaking Self-Attention Failure: Rethinking Query Initialization for Infrared Small Target Detection [22.128797773091403]
Infrared small target detection (IRSTD) faces significant challenges due to the low signal-to-noise ratio (SNR), small target size, and complex cluttered backgrounds.<n>Recent DETR-based detectors exhibit notable performance degradation on IRSTD.
arXiv Detail & Related papers (2026-01-06T09:14:01Z)
Noise-Robust Tiny Object Localization with Flows [63.60972031108944]
We propose a noise-robust localization framework leveraging normalizing flows for flexible error modeling and uncertainty-guided optimization.<n>Our method captures complex, non-Gaussian prediction distributions through flow-based error modeling, enabling robust learning under noisy supervision.<n>An uncertainty-aware gradient modulation mechanism further suppresses learning from high-uncertainty, noise-prone samples, mitigating overfitting while stabilizing training.
arXiv Detail & Related papers (2026-01-02T09:16:55Z)
Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z)
VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection [9.029534000674388]
VG-DETR integrates a Vision Foundation Model (VFM) into the training pipeline in a "free lunch" manner.<n>We introduce a VFM-guided pseudo-label mining strategy that leverages the VFM's semantic priors to assess the reliability of the generated pseudo-labels.<n>In addition, a dual-level VFM-guided alignment method is proposed, which aligns detector features with VFM embeddings at both the instance and image levels.
arXiv Detail & Related papers (2025-08-15T02:35:56Z)
FOAM: A General Frequency-Optimized Anti-Overlapping Framework for Overlapping Object Perception [7.258172311752019]
Overlapping object perception aims to decouple the randomly overlapping foreground-background features, extracting foreground features while suppressing background features, which holds significant application value in fields such as security screening and medical auxiliary diagnosis.<n>We propose a general Frequency-based Anti-Overlapping Framework (FOAM) to assist the model in extracting more texture and contour information.<n>Specifically, we design the Frequency Spatial Transformer Block (FSTB), which can simultaneously extract features from both the frequency and spatial domains, helping the network capture more texture features from the foreground.
arXiv Detail & Related papers (2025-06-16T13:58:49Z)
Source-Free Domain Adaptive Object Detection with Semantics Compensation [54.00183496587841]
We introduce Weak-to-strong Semantics Compensation (WSCo) for strong data augmentation.<n>WSCo compensates for the class-relevant semantics that may be lost during strong augmentation on the fly.<n>WSCo can be implemented as a generic plug-in, easily integrable with any existing SFOD pipelines.
arXiv Detail & Related papers (2024-10-07T23:32:06Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
SOOD: Towards Semi-Supervised Oriented Object Detection [57.05141794402972]
This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework. Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark.
arXiv Detail & Related papers (2023-04-10T11:10:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.