Related papers: DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

URL: http://arxiv.org/abs/2407.09174v1
Date: Fri, 12 Jul 2024 11:16:44 GMT
Title: DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training
Authors: Chen Xin, Andreas Hartel, Enkelejda Kasneci,
Abstract summary: This paper presents DART, an automated end-to-end pipeline for object detection. DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios. The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832.
Score: 8.705939889424558
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Swift and accurate detection of specified objects is crucial for many industrial applications, such as safety monitoring on construction sites. However, traditional approaches rely heavily on arduous manual annotation and data collection, which struggle to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an automated end-to-end pipeline designed to streamline the entire workflow of an object detection application from data collection to model deployment. DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios. It employs a subject-driven image generation module (DreamBooth with SDXL) for data diversification, followed by an annotation stage where open-vocabulary object detection (Grounding DINO) generates bounding box annotations for both generated and original images. These pseudo-labels are then reviewed by a large multimodal model (GPT-4o) to guarantee credibility before serving as ground truth to train real-time object detectors (YOLO). We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832. Furthermore, we adopt a modular design for DART to ensure easy exchangeability and extensibility. This allows for a smooth transition to more advanced algorithms in the future, seamless integration of new object categories without manual labeling, and adaptability to customized environments without extra data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.

Related papers

SAM2Auto: Auto Annotation Using FLASH [13.638155035372835]
Vision-Language Models (VLMs) lag behind Large Language Models due to the scarcity of annotated datasets.<n>We introduce SAM2Auto, the first fully automated annotation pipeline for video datasets requiring no human intervention or dataset-specific training.<n>Our system employs statistical approaches to minimize detection errors while ensuring consistent object tracking throughout entire video sequences.
arXiv Detail & Related papers (2025-06-09T15:15:15Z)
Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels [40.571133087275406]
Multi-agent collaborative dataset, which involves the sharing of complementary observations among agents, holds the potential to break through this bottleneck. We introduce a novel unsupervised method that learns to Detect Objects from Multi-Agent LiDAR scans, termed DOtA, without using labels from external. DOtA uses the complementary observations between agents to perform multi-scale encoding on preliminary labels, then decodes high-quality and low-quality labels.
arXiv Detail & Related papers (2025-03-11T13:34:35Z)
Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation [58.37525311718006]
We put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD) We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario. Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects.
arXiv Detail & Related papers (2024-11-04T12:59:13Z)
Bayesian Detector Combination for Object Detection with Crowdsourced Annotations [49.43709660948812]
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise. We propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations. BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models.
arXiv Detail & Related papers (2024-07-10T18:00:54Z)
SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++. Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation. Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z)
Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection [55.210991151015534]
We present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection. Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
arXiv Detail & Related papers (2024-01-10T08:56:07Z)
Automated Multimodal Data Annotation via Calibration With Indoor Positioning System [0.0]
Our method uses an indoor positioning system (IPS) to produce accurate detection labels for both point clouds and images. In an experiment, the system annotates objects of interest 261.8 times faster than a human baseline.
arXiv Detail & Related papers (2023-12-06T16:54:24Z)
2DDATA: 2D Detection Annotations Transmittable Aggregation for Semantic Segmentation on Point Cloud [0.0]
Inherit from the previous works, we not only fuse the information from multi-modality without above issues, and also exhaust the information in the RGB modality. We demonstrate that our simple design can transmit bounding box prior information to the 3D model encoder, proving the feasibility of large multi-modality models fused with modality-specific data.
arXiv Detail & Related papers (2023-09-21T03:32:22Z)
Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines. It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module. Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z)
Scaling Novel Object Detection with Weakly Supervised Detection Transformers [21.219817483091166]
We propose the Weakly Supervised Detection Transformer, which enables efficient knowledge transfer from a large-scale pretraining dataset to WSOD finetuning. Our experiments show that our approach outperforms previous state-of-the-art models on large-scale novel object detection datasets.
arXiv Detail & Related papers (2022-07-11T21:45:54Z)
Automatic Bounding Box Annotation with Small Training Data Sets for Industrial Manufacturing [0.0]
We discuss how to adapt state-of-the-art object detection methods for the task of automatic bounding box annotation. We show that both can be trained to distinguish unknown objects from a complex but homogeneous background using only a small amount of training data.
arXiv Detail & Related papers (2022-06-01T07:32:32Z)
EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system. It can be trained in one shot on both fully and weakly-annotated data. It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.