Semi-Automated Data Annotation in Multisensor Datasets for Autonomous Vehicle Testing
- URL: http://arxiv.org/abs/2512.24896v1
- Date: Wed, 31 Dec 2025 14:43:48 GMT
- Title: Semi-Automated Data Annotation in Multisensor Datasets for Autonomous Vehicle Testing
- Authors: Andrii Gamalii, Daniel Górniak, Robert Nowak, Bartłomiej Olber, Krystian Radlak, Jakub Winter,
- Abstract summary: This report presents the design and implementation of a semi-automated data annotation pipeline developed within the DARTS project.<n>The DARTS project's goal is to create a large-scale, multimodal dataset of driving scenarios recorded in Polish conditions.<n>The proposed solution adopts a human-in-the-loop approach that combines artificial intelligence with human expertise to reduce annotation cost and duration.
- Score: 2.5225534362856767
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This report presents the design and implementation of a semi-automated data annotation pipeline developed within the DARTS project, whose goal is to create a large-scale, multimodal dataset of driving scenarios recorded in Polish conditions. Manual annotation of such heterogeneous data is both costly and time-consuming. To address this challenge, the proposed solution adopts a human-in-the-loop approach that combines artificial intelligence with human expertise to reduce annotation cost and duration. The system automatically generates initial annotations, enables iterative model retraining, and incorporates data anonymization and domain adaptation techniques. At its core, the tool relies on 3D object detection algorithms to produce preliminary annotations. Overall, the developed tools and methodology result in substantial time savings while ensuring consistent, high-quality annotations across different sensor modalities. The solution directly supports the DARTS project by accelerating the preparation of large annotated dataset in the project's standardized format, strengthening the technological base for autonomous vehicle research in Poland.
Related papers
- Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey [59.3507264893654]
Issue resolution is a complex Software Engineering task integral to real-world development.<n> benchmarks like SWE-bench revealed this task as profoundly difficult for large language models.<n>This paper presents a systematic survey of this emerging domain.
arXiv Detail & Related papers (2026-01-15T18:55:03Z) - Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z) - Data Scaling Laws for End-to-End Autonomous Driving [83.85463296830743]
We evaluate the performance of a simple end-to-end driving architecture on internal driving datasets ranging in size from 16 to 8192 hours.<n>Specifically, we investigate how much additional training data is needed to achieve a target performance gain.
arXiv Detail & Related papers (2025-04-06T03:23:48Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - ANNA: A Deep Learning Based Dataset in Heterogeneous Traffic for
Autonomous Vehicles [2.932123507260722]
This study discusses a custom-built dataset that includes some unidentified vehicles in the perspective of Bangladesh.
A dataset validity check was performed by evaluating models using the Intersection Over Union (IOU) metric.
The results demonstrated that the model trained on our custom dataset was more precise and efficient than the models trained on the KITTI or COCO dataset concerning Bangladeshi traffic.
arXiv Detail & Related papers (2024-01-21T01:14:04Z) - An Efficient Semi-Automated Scheme for Infrastructure LiDAR Annotation [15.523875367380196]
We present an efficient semi-automated annotation tool that automatically annotates LiDAR sequences with tracking algorithms.
Our tool seamlessly integrates multi-object tracking (MOT), single-object tracking (SOT) and suitable trajectory post-processing techniques.
arXiv Detail & Related papers (2023-01-25T17:42:15Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Sensitivity analysis in differentially private machine learning using
hybrid automatic differentiation [54.88777449903538]
We introduce a novel textithybrid automatic differentiation (AD) system for sensitivity analysis.
This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data.
Our approach can enable the principled reasoning about privacy loss in the setting of data processing.
arXiv Detail & Related papers (2021-07-09T07:19:23Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Large Scale Autonomous Driving Scenarios Clustering with Self-supervised
Feature Extraction [6.804209932400134]
This article proposes a comprehensive data clustering framework for a large set of vehicle driving data.
Our approach thoroughly considers the traffic elements, including both in-traffic agent objects and map information.
With the newly designed driving data clustering evaluation metrics based on data-augmentation, the accuracy assessment does not require a human-labeled data-set.
arXiv Detail & Related papers (2021-03-30T06:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.