Related papers: Improving Image Data Leakage Detection in Automotive Software

Improving Image Data Leakage Detection in Automotive Software

URL: http://arxiv.org/abs/2410.23312v1
Date: Tue, 29 Oct 2024 13:37:45 GMT
Title: Improving Image Data Leakage Detection in Automotive Software
Authors: Md Abu Ahammed Babu, Sushant Kumar Pandey, Darko Durisic, Ashok Chaitanya Koppisetty, Miroslaw Staron,
Abstract summary: Data leakage is often overlooked during splitting data into train and test sets before training any ML/DL model. In this study, we conduct a computational experiment on the Cirrus dataset from our industrial partner Volvo Cars. We then evaluate the method on another public dataset, Kitti, which is a popular and widely accepted benchmark dataset in the automotive domain.
Score: 2.622385361961154
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Data leakage is a very common problem that is often overlooked during splitting data into train and test sets before training any ML/DL model. The model performance gets artificially inflated with the presence of data leakage during the evaluation phase which often leads the model to erroneous prediction on real-time deployment. However, detecting the presence of such leakage is challenging, particularly in the object detection context of perception systems where the model needs to be supplied with image data for training. In this study, we conduct a computational experiment on the Cirrus dataset from our industrial partner Volvo Cars to develop a method for detecting data leakage. We then evaluate the method on another public dataset, Kitti, which is a popular and widely accepted benchmark dataset in the automotive domain. The results show that thanks to our proposed method we are able to detect data leakage in the Kitti dataset, which was previously unknown.

Related papers

A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
Existing datasets allow evaluation of only knowns or unknowns - but not both. We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments. The dataset is twice larger than existing anomaly segmentation datasets.
arXiv Detail & Related papers (2025-03-28T10:31:01Z)
Label-Free Model Failure Detection for Lidar-based Point Cloud Segmentation [15.779651238128562]
We introduce label-free model failure detection for lidar-based point cloud segmentation. We leverage different data characteristics by training a supervised and self-supervised stream for the same task to detect failure modes. We perform a large-scale qualitative analysis and present LidarCODA, the first publicly available dataset with labeled anomalies in real-world lidar data.
arXiv Detail & Related papers (2024-07-19T13:36:35Z)
DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection [41.436817746749384]
Diffusion Model is a scalable data engine for object detection. DiffusionEngine (DE) provides high-quality detection-oriented training pairs in a single stage.
arXiv Detail & Related papers (2023-09-07T17:55:01Z)
Exploring the Effectiveness of Dataset Synthesis: An application of Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection. We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset. Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z)
An Outlier Exposure Approach to Improve Visual Anomaly Detection Performance for Mobile Robots [76.36017224414523]
We consider the problem of building visual anomaly detection systems for mobile robots. Standard anomaly detection models are trained using large datasets composed only of non-anomalous data. We tackle the problem of exploiting these data to improve the performance of a Real-NVP anomaly detection model.
arXiv Detail & Related papers (2022-09-20T15:18:13Z)
Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on. We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z)
DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE) It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase. Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z)
Detecting Concept Drift With Neural Network Model Uncertainty [0.0]
Uncertainty Drift Detection (UDD) is able to detect drifts without access to true labels. In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model. We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
arXiv Detail & Related papers (2021-07-05T08:56:36Z)
Unsupervised Model Drift Estimation with Batch Normalization Statistics for Dataset Shift Detection and Model Selection [0.0]
We propose a novel method of model drift estimation by exploiting statistics of batch normalization layer on unlabeled test data. We show the effectiveness of our method not only on dataset shift detection but also on model selection when there are multiple candidate models among model zoo or training trajectories in an unsupervised way.
arXiv Detail & Related papers (2021-07-01T03:04:47Z)
Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets. We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap. We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z)
Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally. Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z)
Anomalous Motion Detection on Highway Using Deep Learning [14.617786106427834]
This paper presents a new anomaly detection dataset - the Highway Traffic Anomaly (HTA) dataset. We evaluate state-of-the-art deep learning anomaly detection models and propose novel variations to these methods.
arXiv Detail & Related papers (2020-06-15T05:40:11Z)
An Incremental Clustering Method for Anomaly Detection in Flight Data [0.0]
We propose a novel incremental anomaly detection method based on Gaussian Mixture Model (GMM) It is a probabilistic clustering model of flight operations that can incrementally update its clusters based on new data. Preliminary results indicate that the incremental learning scheme is effective in dealing with dynamically growing data in flight data analytics.
arXiv Detail & Related papers (2020-05-20T06:58:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.