Related papers: Auto-Labeling Data for Object Detection

Auto-Labeling Data for Object Detection

URL: http://arxiv.org/abs/2506.02359v1
Date: Tue, 03 Jun 2025 01:27:56 GMT
Title: Auto-Labeling Data for Object Detection
Authors: Brent A. Griffin, Manushree Gangwar, Jacob Sela, Jason J. Corso,
Abstract summary: This paper addresses the problem of training standard object detection models without any ground truth labels.<n>We generate application-specific pseudo "ground truth" labels using vision-language foundation models.<n>We find that our approach is a viable alternative to standard labeling in that it maintains competitive performance on multiple datasets.
Score: 20.557988700343373
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Great labels make great models. However, traditional labeling approaches for tasks like object detection have substantial costs at scale. Furthermore, alternatives to fully-supervised object detection either lose functionality or require larger models with prohibitive computational costs for inference at scale. To that end, this paper addresses the problem of training standard object detection models without any ground truth labels. Instead, we configure previously-trained vision-language foundation models to generate application-specific pseudo "ground truth" labels. These auto-generated labels directly integrate with existing model training frameworks, and we subsequently train lightweight detection models that are computationally efficient. In this way, we avoid the costs of traditional labeling, leverage the knowledge of vision-language models, and keep the efficiency of lightweight models for practical application. We perform exhaustive experiments across multiple labeling configurations, downstream inference models, and datasets to establish best practices and set an extensive auto-labeling benchmark. From our results, we find that our approach is a viable alternative to standard labeling in that it maintains competitive performance on multiple datasets and substantially reduces labeling time and costs.

Related papers

Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models [4.157013247909771]
We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer) We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments. We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset.
arXiv Detail & Related papers (2023-11-17T21:58:26Z)
Impact of Label Types on Training SWIN Models with Overhead Imagery [0.0]
This work examined the impact of training shifted window transformers using bounding boxes and segmentation labels. We found that the models trained on only target pixels do not show performance improvement for classification tasks. For object detection, we found that models trained with either label type showed equivalent performance across testing.
arXiv Detail & Related papers (2023-10-11T15:14:54Z)
Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications. In this paper, we reformulate the label-noise problem from a generative-model perspective. Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z)
Ground Truth Inference for Weakly Supervised Entity Matching [76.6732856489872]
We propose a simple but powerful labeling model for weak supervision tasks. We then tailor the labeling model specifically to the task of entity matching. We show that our labeling model results in a 9% higher F1 score on average than the best existing method.
arXiv Detail & Related papers (2022-11-13T17:57:07Z)
Exploiting Unlabeled Data with Vision and Language Models for Object Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images. We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z)
Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data. We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z)
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL) CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z)
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images. We propose modifications and best practices aimed at minimizing human labeling effort. Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z)
Learning from Noisy Labels for Entity-Centric Information Extraction [17.50856935207308]
We propose a simple co-regularization framework for entity-centric information extraction. These models are jointly optimized with task-specific loss, and are regularized to generate similar predictions. In the end, we can take any of the trained models for inference.
arXiv Detail & Related papers (2021-04-17T22:49:12Z)
How to trust unlabeled data? Instance Credibility Inference for Few-Shot Learning [47.21354101796544]
This paper presents a statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the support of unlabeled instances for few-shot visual recognition. We rank the credibility of pseudo-labeled instances along the regularization path of their corresponding incidental parameters, and the most trustworthy pseudo-labeled examples are preserved as the augmented labeled instances.
arXiv Detail & Related papers (2020-07-15T03:38:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.