DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training
- URL: http://arxiv.org/abs/2407.09174v3
- Date: Mon, 29 Jul 2024 09:14:07 GMT
- Title: DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training
- Authors: Chen Xin, Andreas Hartel, Enkelejda Kasneci,
- Abstract summary: This paper presents DART, an automated end-to-end pipeline that revolutionizes object detection from data collection to model evaluation.
It eliminates the need for human labeling and extensive data collection while achieving outstanding accuracy across diverse scenarios.
The current instantiation of DART significantly increases average precision (AP) from 0.064 to 0.832.
- Score: 8.705939889424558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate real-time object detection is vital across numerous industrial applications, from safety monitoring to quality control. Traditional approaches, however, are hindered by arduous manual annotation and data collection, struggling to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an innovative automated end-to-end pipeline that revolutionizes object detection workflows from data collection to model evaluation. It eliminates the need for laborious human labeling and extensive data collection while achieving outstanding accuracy across diverse scenarios. DART encompasses four key stages: (1) Data Diversification using subject-driven image generation (DreamBooth with SDXL), (2) Annotation via open-vocabulary object detection (Grounding DINO) to generate bounding box and class labels, (3) Review of generated images and pseudo-labels by large multimodal models (InternVL-1.5 and GPT-4o) to guarantee credibility, and (4) Training of real-time object detectors (YOLOv8 and YOLOv10) using the verified data. We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current instantiation of DART significantly increases average precision (AP) from 0.064 to 0.832. Its modular design ensures easy exchangeability and extensibility, allowing for future algorithm upgrades, seamless integration of new object categories, and adaptability to customized environments without manual labeling and additional data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.
Related papers
- Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation [58.37525311718006]
We put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD)
We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario.
Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects.
arXiv Detail & Related papers (2024-11-04T12:59:13Z) - Bayesian Detector Combination for Object Detection with Crowdsourced Annotations [49.43709660948812]
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise.
We propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations.
BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models.
arXiv Detail & Related papers (2024-07-10T18:00:54Z) - SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++.
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation.
Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z) - Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object
Detection [55.210991151015534]
We present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection.
Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
arXiv Detail & Related papers (2024-01-10T08:56:07Z) - Automated Multimodal Data Annotation via Calibration With Indoor
Positioning System [0.0]
Our method uses an indoor positioning system (IPS) to produce accurate detection labels for both point clouds and images.
In an experiment, the system annotates objects of interest 261.8 times faster than a human baseline.
arXiv Detail & Related papers (2023-12-06T16:54:24Z) - 2DDATA: 2D Detection Annotations Transmittable Aggregation for Semantic
Segmentation on Point Cloud [0.0]
Inherit from the previous works, we not only fuse the information from multi-modality without above issues, and also exhaust the information in the RGB modality.
We demonstrate that our simple design can transmit bounding box prior information to the 3D model encoder, proving the feasibility of large multi-modality models fused with modality-specific data.
arXiv Detail & Related papers (2023-09-21T03:32:22Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Scaling Novel Object Detection with Weakly Supervised Detection
Transformers [21.219817483091166]
We propose the Weakly Supervised Detection Transformer, which enables efficient knowledge transfer from a large-scale pretraining dataset to WSOD finetuning.
Our experiments show that our approach outperforms previous state-of-the-art models on large-scale novel object detection datasets.
arXiv Detail & Related papers (2022-07-11T21:45:54Z) - Automatic Bounding Box Annotation with Small Training Data Sets for
Industrial Manufacturing [0.0]
We discuss how to adapt state-of-the-art object detection methods for the task of automatic bounding box annotation.
We show that both can be trained to distinguish unknown objects from a complex but homogeneous background using only a small amount of training data.
arXiv Detail & Related papers (2022-06-01T07:32:32Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.