Machine Learning Model Drift Detection Via Weak Data Slices
- URL: http://arxiv.org/abs/2108.05319v1
- Date: Wed, 11 Aug 2021 16:55:34 GMT
- Title: Machine Learning Model Drift Detection Via Weak Data Slices
- Authors: Samuel Ackerman, Parijat Dube, Eitan Farchi, Orna Raz, Marcel
Zalmanovici
- Abstract summary: We propose a method that utilizes feature space rules, called data slices, for drift detection.
We provide experimental indications that our method is likely to identify that the ML model will likely change in performance, based on changes in the underlying data.
- Score: 5.319802998033767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting drift in performance of Machine Learning (ML) models is an
acknowledged challenge. For ML models to become an integral part of business
applications it is essential to detect when an ML model drifts away from
acceptable operation. However, it is often the case that actual labels are
difficult and expensive to get, for example, because they require expert
judgment. Therefore, there is a need for methods that detect likely degradation
in ML operation without labels. We propose a method that utilizes feature space
rules, called data slices, for drift detection. We provide experimental
indications that our method is likely to identify that the ML model will likely
change in performance, based on changes in the underlying data.
Related papers
- Time to Retrain? Detecting Concept Drifts in Machine Learning Systems [1.4499463058550683]
We propose a model-agnostic technique (CDSeer) for detecting concept drift in machine learning (ML) models.
Results show that CDSeer has better precision and recall compared to the state-of-the-art while requiring significantly less manual labeling.
The improved performance and ease of adoption of CDSeer are valuable in making ML systems more reliable.
arXiv Detail & Related papers (2024-10-11T18:47:39Z) - Loss-Free Machine Unlearning [51.34904967046097]
We present a machine unlearning approach that is both retraining- and label-free.
Retraining-free approaches often utilise Fisher information, which is derived from the loss and requires labelled data which may not be available.
We present an extension to the Selective Synaptic Dampening algorithm, substituting the diagonal of the Fisher information matrix for the gradient of the l2 norm of the model output to approximate sensitivity.
arXiv Detail & Related papers (2024-02-29T16:15:34Z) - Mitigating ML Model Decay in Continuous Integration with Data Drift
Detection: An Empirical Study [7.394099294390271]
This study aims to investigate the performance of using data drift detection techniques for automatically detecting the retraining points for ML models for TCP in CI environments.
We employed the Hellinger distance to identify changes in both the values and distribution of input data and leveraged these changes as retraining points for the ML model.
Our experimental evaluation of the Hellinger distance-based method demonstrated its efficacy and efficiency in detecting retraining points and reducing the associated costs.
arXiv Detail & Related papers (2023-05-22T05:55:23Z) - Learn to Unlearn: A Survey on Machine Unlearning [29.077334665555316]
This article presents a review of recent machine unlearning techniques, verification mechanisms, and potential attacks.
We highlight emerging challenges and prospective research directions.
We aim for this paper to provide valuable resources for integrating privacy, equity, andresilience into ML systems.
arXiv Detail & Related papers (2023-05-12T14:28:02Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Particle-Based Score Estimation for State Space Model Learning in
Autonomous Driving [62.053071723903834]
Multi-object state estimation is a fundamental problem for robotic applications.
We consider learning maximum-likelihood parameters using particle methods.
We apply our method to real data collected from autonomous vehicles.
arXiv Detail & Related papers (2022-12-14T01:21:05Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - FreaAI: Automated extraction of data slices to test machine learning
models [2.475112368179548]
We show the feasibility of automatically extracting feature models that result in explainable data slices over which the ML solution under-performs.
Our novel technique, IBM FreaAI aka FreaAI, extracts such slices from structured ML test data or any other labeled data.
arXiv Detail & Related papers (2021-08-12T09:21:16Z) - Detecting Faults during Automatic Screwdriving: A Dataset and Use Case
of Anomaly Detection for Automatic Screwdriving [80.6725125503521]
Data-driven approaches, using Machine Learning (ML) for detecting faults have recently gained increasing interest.
We present a use case of using ML models for detecting faults during automated screwdriving operations.
arXiv Detail & Related papers (2021-07-05T11:46:00Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.