Improving Image Data Leakage Detection in Automotive Software
- URL: http://arxiv.org/abs/2410.23312v1
- Date: Tue, 29 Oct 2024 13:37:45 GMT
- Title: Improving Image Data Leakage Detection in Automotive Software
- Authors: Md Abu Ahammed Babu, Sushant Kumar Pandey, Darko Durisic, Ashok Chaitanya Koppisetty, Miroslaw Staron,
- Abstract summary: Data leakage is often overlooked during splitting data into train and test sets before training any ML/DL model.
In this study, we conduct a computational experiment on the Cirrus dataset from our industrial partner Volvo Cars.
We then evaluate the method on another public dataset, Kitti, which is a popular and widely accepted benchmark dataset in the automotive domain.
- Score: 2.622385361961154
- License:
- Abstract: Data leakage is a very common problem that is often overlooked during splitting data into train and test sets before training any ML/DL model. The model performance gets artificially inflated with the presence of data leakage during the evaluation phase which often leads the model to erroneous prediction on real-time deployment. However, detecting the presence of such leakage is challenging, particularly in the object detection context of perception systems where the model needs to be supplied with image data for training. In this study, we conduct a computational experiment on the Cirrus dataset from our industrial partner Volvo Cars to develop a method for detecting data leakage. We then evaluate the method on another public dataset, Kitti, which is a popular and widely accepted benchmark dataset in the automotive domain. The results show that thanks to our proposed method we are able to detect data leakage in the Kitti dataset, which was previously unknown.
Related papers
- DiffusionEngine: Diffusion Model is Scalable Data Engine for Object
Detection [41.436817746749384]
Diffusion Model is a scalable data engine for object detection.
DiffusionEngine (DE) provides high-quality detection-oriented training pairs in a single stage.
arXiv Detail & Related papers (2023-09-07T17:55:01Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - An Outlier Exposure Approach to Improve Visual Anomaly Detection
Performance for Mobile Robots [76.36017224414523]
We consider the problem of building visual anomaly detection systems for mobile robots.
Standard anomaly detection models are trained using large datasets composed only of non-anomalous data.
We tackle the problem of exploiting these data to improve the performance of a Real-NVP anomaly detection model.
arXiv Detail & Related papers (2022-09-20T15:18:13Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - Detecting Concept Drift With Neural Network Model Uncertainty [0.0]
Uncertainty Drift Detection (UDD) is able to detect drifts without access to true labels.
In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model.
We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
arXiv Detail & Related papers (2021-07-05T08:56:36Z) - Unsupervised Model Drift Estimation with Batch Normalization Statistics
for Dataset Shift Detection and Model Selection [0.0]
We propose a novel method of model drift estimation by exploiting statistics of batch normalization layer on unlabeled test data.
We show the effectiveness of our method not only on dataset shift detection but also on model selection when there are multiple candidate models among model zoo or training trajectories in an unsupervised way.
arXiv Detail & Related papers (2021-07-01T03:04:47Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally.
Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z) - Anomalous Motion Detection on Highway Using Deep Learning [14.617786106427834]
This paper presents a new anomaly detection dataset - the Highway Traffic Anomaly (HTA) dataset.
We evaluate state-of-the-art deep learning anomaly detection models and propose novel variations to these methods.
arXiv Detail & Related papers (2020-06-15T05:40:11Z) - An Incremental Clustering Method for Anomaly Detection in Flight Data [0.0]
We propose a novel incremental anomaly detection method based on Gaussian Mixture Model (GMM)
It is a probabilistic clustering model of flight operations that can incrementally update its clusters based on new data.
Preliminary results indicate that the incremental learning scheme is effective in dealing with dynamically growing data in flight data analytics.
arXiv Detail & Related papers (2020-05-20T06:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.