Related papers: Monitoring Machine Learning Models: Online Detection of Relevant Deviations

Monitoring Machine Learning Models: Online Detection of Relevant Deviations

URL: http://arxiv.org/abs/2309.15187v1
Date: Tue, 26 Sep 2023 18:46:37 GMT
Title: Monitoring Machine Learning Models: Online Detection of Relevant Deviations
Authors: Florian Heinrichs
Abstract summary: Machine learning models can degrade over time due to changes in data distribution or other factors. We propose a sequential monitoring scheme to detect relevant changes. Our research contributes a practical solution for distinguishing between minor fluctuations and meaningful degradations.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning models are essential tools in various domains, but their performance can degrade over time due to changes in data distribution or other factors. On one hand, detecting and addressing such degradations is crucial for maintaining the models' reliability. On the other hand, given enough data, any arbitrary small change of quality can be detected. As interventions, such as model re-training or replacement, can be expensive, we argue that they should only be carried out when changes exceed a given threshold. We propose a sequential monitoring scheme to detect these relevant changes. The proposed method reduces unnecessary alerts and overcomes the multiple testing problem by accounting for temporal dependence of the measured model quality. Conditions for consistency and specified asymptotic levels are provided. Empirical validation using simulated and real data demonstrates the superiority of our approach in detecting relevant changes in model quality compared to benchmark methods. Our research contributes a practical solution for distinguishing between minor fluctuations and meaningful degradations in machine learning model performance, ensuring their reliability in dynamic environments.

Related papers

Modeling Behavior Change for Multi-model At-Risk Students Early Prediction (extended version) [10.413751893289056]
Current models primarily identify students with consistently poor performance through simple and discrete behavioural patterns. We have developed an innovative prediction model, Multimodal- ChangePoint Detection (MCPD), utilizing the textual teacher remark data and numerical grade data from middle schools. Our model achieves an accuracy range of 70- 75%, with an average outperforming baseline algorithms by approximately 5-10%.
arXiv Detail & Related papers (2025-02-19T11:16:46Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Complementary Learning for Real-World Model Failure Detection [15.779651238128562]
We introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors. We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner. We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds.
arXiv Detail & Related papers (2024-07-19T13:36:35Z)
Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows [0.0]
This study investigates the efficacy of machine learning models in network anomaly detection through the critical lens of partial versus complete flow information. We demonstrate a significant performance difference when models trained on complete flows are tested against partial flows. The study reveals that a minimum of 7 packets in the test set is required for maintaining reliable detection rates.
arXiv Detail & Related papers (2024-07-03T07:14:25Z)
Test-Time Adaptation for Combating Missing Modalities in Egocentric Videos [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues. We propose a novel approach to address this issue at test time without requiring retraining. MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z)
Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World [15.1355549683548]
We analyze two different anomaly detection model maintenance techniques in terms of the model update frequency. We investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.
arXiv Detail & Related papers (2023-11-17T09:54:35Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Reliability in Semantic Segmentation: Are We on the Right Track? [15.0189654919665]
We analyze a broad variety of models, spanning from older ResNet-based architectures to novel transformers. We find that while recent models are significantly more robust, they are not overall more reliable in terms of uncertainty estimation. This is the first study on modern segmentation models focused on both robustness and uncertainty estimation.
arXiv Detail & Related papers (2023-03-20T17:38:24Z)
Generative Modeling Helps Weak Supervision (and Vice Versa) [87.62271390571837]
We propose a model fusing weak supervision and generative adversarial networks. It captures discrete variables in the data alongside the weak supervision derived label estimate. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels.
arXiv Detail & Related papers (2022-03-22T20:24:21Z)
Tracking the risk of a deployed model and detecting harmful distribution shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)
Score-Based Change Detection for Gradient-Based Learning Machines [9.670556223243182]
We present a generic score-based change detection method that can detect a change in any number of components of a machine learning model trained via empirical risk minimization. We establish the consistency of the hypothesis test and show how to calibrate it to achieve a prescribed false alarm rate.
arXiv Detail & Related papers (2021-06-27T01:38:11Z)
DirectDebug: Automated Testing and Debugging of Feature Models [55.41644538483948]
Variability models (e.g., feature models) are a common way for the representation of variabilities and commonalities of software artifacts. Complex and often large-scale feature models can become faulty, i.e., do not represent the expected variability properties of the underlying software artifact.
arXiv Detail & Related papers (2021-02-11T11:22:20Z)
How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance. We formulate a quality measure for the data set, which we refer to as $rho$-gap. We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
On the Role of Dataset Quality and Heterogeneity in Model Confidence [27.657631193015252]
Safety-critical applications require machine learning models that output accurate and calibrated probabilities. Uncalibrated deep networks are known to make over-confident predictions. We study the impact of dataset quality by studying the impact of dataset size and the label noise on the model confidence.
arXiv Detail & Related papers (2020-02-23T05:13:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.