Data Shift of Object Detection in Autonomous Driving
- URL: http://arxiv.org/abs/2508.11868v1
- Date: Sat, 16 Aug 2025 01:52:31 GMT
- Title: Data Shift of Object Detection in Autonomous Driving
- Authors: Lida Xu,
- Abstract summary: We study the data shift problem in autonomous driving object detection tasks.<n>We employ shift detection analysis techniques to perform dataset categorization and balancing.<n>To validate our approach, we optimize the model by integrating CycleGAN-based data augmentation techniques with the YOLOv5 framework.
- Score: 0.40792653193642503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the widespread adoption of machine learning technologies in autonomous driving systems, their role in addressing complex environmental perception challenges has become increasingly crucial. However, existing machine learning models exhibit significant vulnerability, as their performance critically depends on the fundamental assumption that training and testing data satisfy the independent and identically distributed condition, which is difficult to guarantee in real-world applications. Dynamic variations in data distribution caused by seasonal changes, weather fluctuations lead to data shift problems in autonomous driving systems. This study investigates the data shift problem in autonomous driving object detection tasks, systematically analyzing its complexity and diverse manifestations. We conduct a comprehensive review of data shift detection methods and employ shift detection analysis techniques to perform dataset categorization and balancing. Building upon this foundation, we construct an object detection model. To validate our approach, we optimize the model by integrating CycleGAN-based data augmentation techniques with the YOLOv5 framework. Experimental results demonstrate that our method achieves superior performance compared to baseline models on the BDD100K dataset.
Related papers
- Predict Training Data Quality via Its Geometry in Metric Space [7.056460460498077]
We propose that the richness of representation and the elimination of redundancy within training data critically influence learning outcomes.<n>To investigate this, we employ persistent homology to extract topological features from data within a metric space.<n>Our findings highlight persistent homology as a powerful tool for analyzing and enhancing the training data that drives AI systems.
arXiv Detail & Related papers (2025-10-12T16:59:28Z) - From Physics to Machine Learning and Back: Part II - Learning and Observational Bias in PHM [52.64097278841485]
Review examines how incorporating learning and observational biases through physics-informed modeling and data strategies can guide models toward physically consistent and reliable predictions.<n>Fast adaptation methods including meta-learning and few-shot learning are reviewed alongside domain generalization techniques.
arXiv Detail & Related papers (2025-09-25T14:15:43Z) - Enhancing Object Detection Accuracy in Autonomous Vehicles Using Synthetic Data [0.8267034114134277]
Performance of machine learning models depends on the nature and size of the training data sets.
High-quality, diverse, relevant and representative training data is essential to build accurate and reliable machine learning models.
It is hypothesised that well-designed synthetic data can improve the performance of a machine learning algorithm.
arXiv Detail & Related papers (2024-11-23T16:38:02Z) - NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks [5.852777557137612]
We introduce a synthetic mobility dataset, NUMOSIM, that provides a controlled, ethical, and diverse environment for anomaly benchmarking techniques.
NUMOSIM simulates a wide array of realistic mobility scenarios, encompassing both typical and anomalous behaviours.
We provide open access to the NUMOSIM dataset, along with comprehensive documentation, evaluation metrics, and benchmark results.
arXiv Detail & Related papers (2024-09-04T18:31:24Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control [59.20038082523832]
We present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications.<n>We develop a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data.
arXiv Detail & Related papers (2024-03-28T14:07:13Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Learning Latent Dynamics via Invariant Decomposition and
(Spatio-)Temporal Transformers [0.6767885381740952]
We propose a method for learning dynamical systems from high-dimensional empirical data.
We focus on the setting in which data are available from multiple different instances of a system.
We study behaviour through simple theoretical analyses and extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-21T07:52:07Z) - Detection of Dataset Shifts in Learning-Enabled Cyber-Physical Systems
using Variational Autoencoder for Regression [1.5039745292757671]
We propose an approach to detect the dataset shifts effectively for regression problems.
Our approach is based on the inductive conformal anomaly detection and utilizes a variational autoencoder for regression model.
We demonstrate our approach by using an advanced emergency braking system implemented in an open-source simulator for self-driving cars.
arXiv Detail & Related papers (2021-04-14T03:46:37Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.