Out-of-Distribution Detection and Data Drift Monitoring using
Statistical Process Control
- URL: http://arxiv.org/abs/2402.08088v1
- Date: Mon, 12 Feb 2024 22:10:06 GMT
- Title: Out-of-Distribution Detection and Data Drift Monitoring using
Statistical Process Control
- Authors: Ghada Zamzmi, Kesavan Venkatesh, Brandon Nelson, Smriti Prathapan,
Paul H. Yi, Berkman Sahiner, and Jana G. Delfino
- Abstract summary: Machine learning (ML) methods often fail with data that deviates from their training distribution.
This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety.
We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution detection and drift monitoring.
- Score: 1.2196109054410231
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: Machine learning (ML) methods often fail with data that deviates
from their training distribution. This is a significant concern for ML-enabled
devices in clinical settings, where data drift may cause unexpected performance
that jeopardizes patient safety.
Method: We propose a ML-enabled Statistical Process Control (SPC) framework
for out-of-distribution (OOD) detection and drift monitoring. SPC is
advantageous as it visually and statistically highlights deviations from the
expected distribution. To demonstrate the utility of the proposed framework for
monitoring data drift in radiological images, we investigated different design
choices, including methods for extracting feature representations, drift
quantification, and SPC parameter selection.
Results: We demonstrate the effectiveness of our framework for two tasks: 1)
differentiating axial vs. non-axial computed tomography (CT) images and 2)
separating chest x-ray (CXR) from other modalities. For both tasks, we achieved
high accuracy in detecting OOD inputs, with 0.913 in CT and 0.995 in CXR, and
sensitivity of 0.980 in CT and 0.984 in CXR. Our framework was also adept at
monitoring data streams and identifying the time a drift occurred. In a
simulation with 100 daily CXR cases, we detected a drift in OOD input
percentage from 0-1% to 3-5% within two days, maintaining a low false-positive
rate. Through additional experimental results, we demonstrate the framework's
data-agnostic nature and independence from the underlying model's structure.
Conclusion: We propose a framework for OOD detection and drift monitoring
that is agnostic to data, modality, and model. The framework is customizable
and can be adapted for specific applications.
Related papers
- Can Your Generative Model Detect Out-of-Distribution Covariate Shift? [2.0144831048903566]
We propose a novel method for detecting Out-of-Distribution (OOD) sensory data using conditional Normalizing Flows (cNFs)
Our results on CIFAR10 vs. CIFAR10-C and ImageNet200 vs. ImageNet200-C demonstrate the effectiveness of the method.
arXiv Detail & Related papers (2024-09-04T19:27:56Z) - Efficient Data-Sketches and Fine-Tuning for Early Detection of Distributional Drift in Medical Imaging [5.1358645354733765]
This paper presents an accurate and sensitive approach to detect distributional drift in CT-scan medical images.
We developed a robust library model for real-time anomaly detection, allowing for efficient comparison of incoming images.
We fine-tuned a vision transformer pre-trained model to extract relevant features using breast cancer images.
arXiv Detail & Related papers (2024-08-15T23:46:37Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Quantifying the effect of X-ray scattering for data generation in real-time defect detection [1.124958340749622]
In-line detection requires highly accurate, robust, and fast algorithms.
DCNNs satisfy these requirements when a large amount of labeled data is available.
X-ray scattering is known to be computationally expensive to simulate.
arXiv Detail & Related papers (2023-05-22T08:29:43Z) - CODiT: Conformal Out-of-Distribution Detection in Time-Series Data [11.565104282674973]
In many applications, the inputs to a machine learning model form a temporal sequence.
We propose using deviation from the in-distribution temporal equivariance as the non-conformity measure in conformal anomaly detection framework.
We illustrate the efficacy of CODiT by achieving state-of-the-art results on computer vision datasets in autonomous driving.
arXiv Detail & Related papers (2022-07-24T16:41:14Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - A Deep Learning Based Workflow for Detection of Lung Nodules With Chest
Radiograph [0.0]
We built a segmentation model to identify lung areas from CXRs, and sliced them into 16 patches.
These labeled patches were then used to train finetune a deep neural network(DNN) model, classifying the patches as positive or negative.
arXiv Detail & Related papers (2021-12-19T16:19:46Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z) - Learn what you can't learn: Regularized Ensembles for Transductive
Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios.
This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data.
We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.