DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training
- URL: http://arxiv.org/abs/2407.09174v1
- Date: Fri, 12 Jul 2024 11:16:44 GMT
- Title: DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training
- Authors: Chen Xin, Andreas Hartel, Enkelejda Kasneci,
- Abstract summary: This paper presents DART, an automated end-to-end pipeline for object detection.
DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios.
The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832.
- Score: 8.705939889424558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Swift and accurate detection of specified objects is crucial for many industrial applications, such as safety monitoring on construction sites. However, traditional approaches rely heavily on arduous manual annotation and data collection, which struggle to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an automated end-to-end pipeline designed to streamline the entire workflow of an object detection application from data collection to model deployment. DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios. It employs a subject-driven image generation module (DreamBooth with SDXL) for data diversification, followed by an annotation stage where open-vocabulary object detection (Grounding DINO) generates bounding box annotations for both generated and original images. These pseudo-labels are then reviewed by a large multimodal model (GPT-4o) to guarantee credibility before serving as ground truth to train real-time object detectors (YOLO). We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832. Furthermore, we adopt a modular design for DART to ensure easy exchangeability and extensibility. This allows for a smooth transition to more advanced algorithms in the future, seamless integration of new object categories without manual labeling, and adaptability to customized environments without extra data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.
Related papers
- Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object
Detection [55.210991151015534]
We present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection.
Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
arXiv Detail & Related papers (2024-01-10T08:56:07Z) - Automated Multimodal Data Annotation via Calibration With Indoor
Positioning System [0.0]
Our method uses an indoor positioning system (IPS) to produce accurate detection labels for both point clouds and images.
In an experiment, the system annotates objects of interest 261.8 times faster than a human baseline.
arXiv Detail & Related papers (2023-12-06T16:54:24Z) - 2DDATA: 2D Detection Annotations Transmittable Aggregation for Semantic
Segmentation on Point Cloud [0.0]
Inherit from the previous works, we not only fuse the information from multi-modality without above issues, and also exhaust the information in the RGB modality.
We demonstrate that our simple design can transmit bounding box prior information to the 3D model encoder, proving the feasibility of large multi-modality models fused with modality-specific data.
arXiv Detail & Related papers (2023-09-21T03:32:22Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - D2DF2WOD: Learning Object Proposals for Weakly-Supervised Object
Detection via Progressive Domain Adaptation [25.41133780678981]
D2DF2WOD is a Fully-to-Weakly Supervised Object Detection framework.
It uses synthetic data, annotated with precise object localization, to supplement a natural image target domain.
Our method results in consistently improved object detection and localization compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-02T18:58:03Z) - Scaling Novel Object Detection with Weakly Supervised Detection
Transformers [21.219817483091166]
We propose the Weakly Supervised Detection Transformer, which enables efficient knowledge transfer from a large-scale pretraining dataset to WSOD finetuning.
Our experiments show that our approach outperforms previous state-of-the-art models on large-scale novel object detection datasets.
arXiv Detail & Related papers (2022-07-11T21:45:54Z) - Omni-DETR: Omni-Supervised Object Detection with Transformers [165.4190908259015]
We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations.
Under this unified architecture, different types of weak labels can be leveraged to generate accurate pseudo labels.
We have found that weak annotations can help to improve detection performance and a mixture of them can achieve a better trade-off between annotation cost and accuracy.
arXiv Detail & Related papers (2022-03-30T06:36:09Z) - End-to-end Deep Object Tracking with Circular Loss Function for Rotated
Bounding Box [68.8204255655161]
We introduce a novel end-to-end deep learning method based on the Transformer Multi-Head Attention architecture.
We also present a new type of loss function, which takes into account the bounding box overlap and orientation.
arXiv Detail & Related papers (2020-12-17T17:29:29Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.