Monitoring Machine Learning Models: Online Detection of Relevant
Deviations
- URL: http://arxiv.org/abs/2309.15187v1
- Date: Tue, 26 Sep 2023 18:46:37 GMT
- Title: Monitoring Machine Learning Models: Online Detection of Relevant
Deviations
- Authors: Florian Heinrichs
- Abstract summary: Machine learning models can degrade over time due to changes in data distribution or other factors.
We propose a sequential monitoring scheme to detect relevant changes.
Our research contributes a practical solution for distinguishing between minor fluctuations and meaningful degradations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models are essential tools in various domains, but their
performance can degrade over time due to changes in data distribution or other
factors. On one hand, detecting and addressing such degradations is crucial for
maintaining the models' reliability. On the other hand, given enough data, any
arbitrary small change of quality can be detected. As interventions, such as
model re-training or replacement, can be expensive, we argue that they should
only be carried out when changes exceed a given threshold. We propose a
sequential monitoring scheme to detect these relevant changes. The proposed
method reduces unnecessary alerts and overcomes the multiple testing problem by
accounting for temporal dependence of the measured model quality. Conditions
for consistency and specified asymptotic levels are provided. Empirical
validation using simulated and real data demonstrates the superiority of our
approach in detecting relevant changes in model quality compared to benchmark
methods. Our research contributes a practical solution for distinguishing
between minor fluctuations and meaningful degradations in machine learning
model performance, ensuring their reliability in dynamic environments.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Complementary Learning for Real-World Model Failure Detection [15.779651238128562]
We introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors.
We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner.
We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds.
arXiv Detail & Related papers (2024-07-19T13:36:35Z) - Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World [15.1355549683548]
We analyze two different anomaly detection model maintenance techniques in terms of the model update frequency.
We investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.
arXiv Detail & Related papers (2023-11-17T09:54:35Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Reliability in Semantic Segmentation: Are We on the Right Track? [15.0189654919665]
We analyze a broad variety of models, spanning from older ResNet-based architectures to novel transformers.
We find that while recent models are significantly more robust, they are not overall more reliable in terms of uncertainty estimation.
This is the first study on modern segmentation models focused on both robustness and uncertainty estimation.
arXiv Detail & Related papers (2023-03-20T17:38:24Z) - Generative Modeling Helps Weak Supervision (and Vice Versa) [87.62271390571837]
We propose a model fusing weak supervision and generative adversarial networks.
It captures discrete variables in the data alongside the weak supervision derived label estimate.
It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels.
arXiv Detail & Related papers (2022-03-22T20:24:21Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z) - Score-Based Change Detection for Gradient-Based Learning Machines [9.670556223243182]
We present a generic score-based change detection method that can detect a change in any number of components of a machine learning model trained via empirical risk minimization.
We establish the consistency of the hypothesis test and show how to calibrate it to achieve a prescribed false alarm rate.
arXiv Detail & Related papers (2021-06-27T01:38:11Z) - DirectDebug: Automated Testing and Debugging of Feature Models [55.41644538483948]
Variability models (e.g., feature models) are a common way for the representation of variabilities and commonalities of software artifacts.
Complex and often large-scale feature models can become faulty, i.e., do not represent the expected variability properties of the underlying software artifact.
arXiv Detail & Related papers (2021-02-11T11:22:20Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - On the Role of Dataset Quality and Heterogeneity in Model Confidence [27.657631193015252]
Safety-critical applications require machine learning models that output accurate and calibrated probabilities.
Uncalibrated deep networks are known to make over-confident predictions.
We study the impact of dataset quality by studying the impact of dataset size and the label noise on the model confidence.
arXiv Detail & Related papers (2020-02-23T05:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.