Did the Model Change? Efficiently Assessing Machine Learning API Shifts
- URL: http://arxiv.org/abs/2107.14203v1
- Date: Thu, 29 Jul 2021 17:41:53 GMT
- Title: Did the Model Change? Efficiently Assessing Machine Learning API Shifts
- Authors: Lingjiao Chen, Tracy Cai, Matei Zaharia, James Zou
- Abstract summary: Machine learning (ML) prediction APIs are increasingly widely used.
They can change over time due to model updates or retraining.
It is often not clear to the user if and how the ML model has changed.
- Score: 24.342984907651505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) prediction APIs are increasingly widely used. An ML API
can change over time due to model updates or retraining. This presents a key
challenge in the usage of the API because it is often not clear to the user if
and how the ML model has changed. Model shifts can affect downstream
application performance and also create oversight issues (e.g. if consistency
is desired). In this paper, we initiate a systematic investigation of ML API
shifts. We first quantify the performance shifts from 2020 to 2021 of popular
ML APIs from Google, Microsoft, Amazon, and others on a variety of datasets. We
identified significant model shifts in 12 out of 36 cases we investigated.
Interestingly, we found several datasets where the API's predictions became
significantly worse over time. This motivated us to formulate the API shift
assessment problem at a more fine-grained level as estimating how the API
model's confusion matrix changes over time when the data distribution is
constant. Monitoring confusion matrix shifts using standard random sampling can
require a large number of samples, which is expensive as each API call costs a
fee. We propose a principled adaptive sampling algorithm, MASA, to efficiently
estimate confusion matrix shifts. MASA can accurately estimate the confusion
matrix shifts in commercial ML APIs using up to 90% fewer samples compared to
random sampling. This work establishes ML API shifts as an important problem to
study and provides a cost-effective approach to monitor such shifts.
Related papers
- Model Equality Testing: Which Model Is This API Serving? [59.005869726179455]
We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem.
A test built on a simple string kernel achieves a median of 77.4% power against a range of distortions.
We then apply this test to commercial inference APIs for four Llama models, finding that 11 out of 31 endpoints serve different distributions than reference weights released by Meta.
arXiv Detail & Related papers (2024-10-26T18:34:53Z) - Let's Predict Who Will Move to a New Job [0.0]
We discuss how machine learning is used to predict who will move to a new job.
Data is pre-processed into a suitable format for ML models.
Models are assessed using decision support metrics such as precision, recall, F1-Score, and accuracy.
arXiv Detail & Related papers (2023-09-15T11:43:09Z) - Time-Varying Propensity Score to Bridge the Gap between the Past and Present [104.46387765330142]
We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data.
We demonstrate different ways of implementing it and evaluate it on a variety of problems.
arXiv Detail & Related papers (2022-10-04T07:21:49Z) - HAPI: A Large-scale Longitudinal Dataset of Commercial ML API
Predictions [35.48276161473216]
We present HAPI, a longitudinal dataset of 1,761,417 instances of commercial ML API applications.
Each instance consists of a query input for an API along with the API's output prediction/annotation and confidence scores.
arXiv Detail & Related papers (2022-09-18T01:52:16Z) - HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques [48.82319198853359]
HardVis is a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios.
Users can explore subsets of data from different perspectives to decide all those parameters.
The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case.
arXiv Detail & Related papers (2022-03-29T17:04:16Z) - Tribuo: Machine Learning with Provenance in Java [0.0]
We introduce Tribuo, a Java ML library that integrates training, type-safety, runtime checking, and automatic recording into a single framework.
All Tribuo's models and evaluations record the full processing pipeline for input data, along with the training algorithms.
arXiv Detail & Related papers (2021-10-06T19:10:50Z) - Machine Learning Model Drift Detection Via Weak Data Slices [5.319802998033767]
We propose a method that utilizes feature space rules, called data slices, for drift detection.
We provide experimental indications that our method is likely to identify that the ML model will likely change in performance, based on changes in the underlying data.
arXiv Detail & Related papers (2021-08-11T16:55:34Z) - Detecting Faults during Automatic Screwdriving: A Dataset and Use Case
of Anomaly Detection for Automatic Screwdriving [80.6725125503521]
Data-driven approaches, using Machine Learning (ML) for detecting faults have recently gained increasing interest.
We present a use case of using ML models for detecting faults during automated screwdriving operations.
arXiv Detail & Related papers (2021-07-05T11:46:00Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Vamsa: Automated Provenance Tracking in Data Science Scripts [17.53546311589593]
We introduce the ML provenance tracking problem.
We discuss the challenges in capturing such information in the context of Python.
We present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users' code.
arXiv Detail & Related papers (2020-01-07T02:39:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.