Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection
- URL: http://arxiv.org/abs/2504.04517v1
- Date: Sun, 06 Apr 2025 15:30:35 GMT
- Title: Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection
- Authors: Jiancheng Pan, Yanxing Liu, Xiao He, Long Peng, Jiahao Li, Yuze Sun, Xiaomeng Huang,
- Abstract summary: Foundation models pretrained on extensive datasets have performed remarkably in the cross-domain few-shot object detection task.<n>We found that the integration of image-based data augmentation techniques and grid-based sub-domain search strategy significantly enhances the performance of these foundation models.<n>Our findings substantially advance the practical deployment of vision-language models in data-scarce environments.
- Score: 13.980798935767558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models pretrained on extensive datasets, such as GroundingDINO and LAE-DINO, have performed remarkably in the cross-domain few-shot object detection (CD-FSOD) task. Through rigorous few-shot training, we found that the integration of image-based data augmentation techniques and grid-based sub-domain search strategy significantly enhances the performance of these foundation models. Building upon GroundingDINO, we employed several widely used image augmentation methods and established optimization objectives to effectively navigate the expansive domain space in search of optimal sub-domains. This approach facilitates efficient few-shot object detection and introduces an approach to solving the CD-FSOD problem by efficiently searching for the optimal parameter configuration from the foundation model. Our findings substantially advance the practical deployment of vision-language models in data-scarce environments, offering critical insights into optimizing their cross-domain generalization capabilities without labor-intensive retraining. Code is available at https://github.com/jaychempan/ETS.
Related papers
- Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff.<n>By integrating pseudo-target domain data with source domain data, we diversify the training dataset.<n> Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z) - Parameter-Efficient Active Learning for Foundational models [7.799711162530711]
Foundational vision transformer models have shown impressive few shot performance on many vision tasks.
This research presents a novel investigation into the application of parameter efficient fine-tuning methods within an active learning (AL) framework.
arXiv Detail & Related papers (2024-06-13T16:30:32Z) - Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models [3.072340427031969]
Few-shot action recognition (FSAR) aims to learn a model capable of identifying novel actions in videos using only a few examples.
In assuming the base dataset seen during meta-training and novel dataset used for evaluation can come from different domains, cross-domain few-shot learning alleviates data collection and annotation costs.
We systematically evaluate existing state-of-the-art single-domain, transfer-based, and cross-domain FSAR methods on new cross-domain tasks.
arXiv Detail & Related papers (2024-06-03T07:48:18Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Open-Set Domain Adaptation with Visual-Language Foundation Models [51.49854335102149]
Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge from a source domain to a target domain with unlabeled data.
Open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training phase.
arXiv Detail & Related papers (2023-07-30T11:38:46Z) - Target-Aware Generative Augmentations for Single-Shot Adaptation [21.840653627684855]
We propose a new approach to adapting models from a source domain to a target domain.
SiSTA fine-tunes a generative model from the source domain using a single-shot target, and then employs novel sampling strategies for curating synthetic target data.
We find that SiSTA produces significantly improved generalization over existing baselines in face detection and multi-class object recognition.
arXiv Detail & Related papers (2023-05-22T17:46:26Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Self-training through Classifier Disagreement for Cross-Domain Opinion
Target Extraction [62.41511766918932]
Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining.
Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios.
We propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagrees on the unlabelled target data.
arXiv Detail & Related papers (2023-02-28T16:31:17Z) - Towards Geospatial Foundation Models via Continual Pretraining [22.825065739563296]
We propose a novel paradigm for building highly effective foundation models with minimal resource cost and carbon impact.
We first construct a compact yet diverse dataset from multiple sources to promote feature diversity, which we term GeoPile.
Then, we investigate the potential of continual pretraining from large-scale ImageNet-22k models and propose a multi-objective continual pretraining paradigm.
arXiv Detail & Related papers (2023-02-09T07:39:02Z) - Enhancing Object Detection for Autonomous Driving by Optimizing Anchor
Generation and Addressing Class Imbalance [0.0]
This study presents an enhanced 2D object detector based on Faster R-CNN that is better suited for the context of autonomous vehicles.
The proposed modifications over the Faster R-CNN do not increase computational cost and can easily be extended to optimize other anchor-based detection frameworks.
arXiv Detail & Related papers (2021-04-08T16:58:31Z) - Optimized Generic Feature Learning for Few-shot Classification across
Domains [96.4224578618561]
We propose to use cross-domain, cross-task data as validation objective for hyper- parameter optimization (HPO)
We demonstrate the effectiveness of this strategy on few-shot image classification within and across domains.
The learned features outperform all previous few-shot and meta-learning approaches.
arXiv Detail & Related papers (2020-01-22T09:31:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.