FMG-Det: Foundation Model Guided Robust Object Detection
- URL: http://arxiv.org/abs/2505.23726v1
- Date: Thu, 29 May 2025 17:55:41 GMT
- Title: FMG-Det: Foundation Model Guided Robust Object Detection
- Authors: Darryl Hannan, Timothy Doster, Henry Kvinge, Adam Attarian, Yijing Watkins,
- Abstract summary: Training on noisy annotations significantly degrades detector performance.<n>We propose -Det, a simple, efficient methodology for training models with noisy annotations.
- Score: 7.489718044485341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collecting high quality data for object detection tasks is challenging due to the inherent subjectivity in labeling the boundaries of an object. This makes it difficult to not only collect consistent annotations across a dataset but also to validate them, as no two annotators are likely to label the same object using the exact same coordinates. These challenges are further compounded when object boundaries are partially visible or blurred, which can be the case in many domains. Training on noisy annotations significantly degrades detector performance, rendering them unusable, particularly in few-shot settings, where just a few corrupted annotations can impact model performance. In this work, we propose FMG-Det, a simple, efficient methodology for training models with noisy annotations. More specifically, we propose combining a multiple instance learning (MIL) framework with a pre-processing pipeline that leverages powerful foundation models to correct labels prior to training. This pre-processing pipeline, along with slight modifications to the detector head, results in state-of-the-art performance across a number of datasets, for both standard and few-shot scenarios, while being much simpler and more efficient than other approaches.
Related papers
- S$^2$Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection [55.34086214300803]
We introduce a novel setting called sparsely annotated object detection (SAOOD), which only labels partial instances.<n>Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning.<n>To this end, we propose the S$2$Teacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations.
arXiv Detail & Related papers (2025-04-15T11:57:00Z) - Bayesian Detector Combination for Object Detection with Crowdsourced Annotations [49.43709660948812]
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise.
We propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations.
BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models.
arXiv Detail & Related papers (2024-07-10T18:00:54Z) - Boost UAV-based Ojbect Detection via Scale-Invariant Feature Disentanglement and Adversarial Learning [18.11107031800982]
We propose to improve single-stage inference accuracy through learning scale-invariant features.<n>Our approach can effectively improve model accuracy and achieve state-of-the-art (SoTA) performance on two datasets.
arXiv Detail & Related papers (2024-05-24T11:40:22Z) - The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models [24.53385855664792]
In object detection and instance segmentation, foundation models such as SAM and DINO struggle to achieve satisfactory performance.
We propose $textbfZip$ which $textbfZ$ips up CL$textbfip$ and SAM in a novel classification-first-then-discovery pipeline.
Our Zip significantly boosts SAM's mask AP on COCO dataset by 12.5% and establishes state-of-the-art performance in various settings.
arXiv Detail & Related papers (2024-04-18T07:22:38Z) - Towards Few-Annotation Learning for Object Detection: Are
Transformer-based Models More Efficient ? [11.416621957617334]
In this paper, we propose a semi-supervised method tailored for the current state-of-the-art object detector Deformable DETR.
We evaluate our method on the semi-supervised object detection benchmarks COCO and Pascal VOC, and it outperforms previous methods, especially when annotations are scarce.
arXiv Detail & Related papers (2023-10-30T18:51:25Z) - Scaling Novel Object Detection with Weakly Supervised Detection
Transformers [21.219817483091166]
We propose the Weakly Supervised Detection Transformer, which enables efficient knowledge transfer from a large-scale pretraining dataset to WSOD finetuning.
Our experiments show that our approach outperforms previous state-of-the-art models on large-scale novel object detection datasets.
arXiv Detail & Related papers (2022-07-11T21:45:54Z) - Omni-DETR: Omni-Supervised Object Detection with Transformers [165.4190908259015]
We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations.
Under this unified architecture, different types of weak labels can be leveraged to generate accurate pseudo labels.
We have found that weak annotations can help to improve detection performance and a mixture of them can achieve a better trade-off between annotation cost and accuracy.
arXiv Detail & Related papers (2022-03-30T06:36:09Z) - Plug-and-Play Few-shot Object Detection with Meta Strategy and Explicit
Localization Inference [78.41932738265345]
This paper proposes a plug detector that can accurately detect the objects of novel categories without fine-tuning process.
We introduce two explicit inferences into the localization process to reduce its dependence on annotated data.
It shows a significant lead in both efficiency, precision, and recall under varied evaluation protocols.
arXiv Detail & Related papers (2021-10-26T03:09:57Z) - Dense Relation Distillation with Context-aware Aggregation for Few-Shot
Object Detection [18.04185751827619]
Few-shot object detection is challenging since the fine-grained feature of novel object can be easily overlooked with only a few data available.
We propose Dense Relation Distillation with Context-aware Aggregation (DCNet) to tackle the few-shot detection problem.
arXiv Detail & Related papers (2021-03-30T05:34:49Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z) - Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task.
Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.