Related papers: An Empirical Analysis of Backward Compatibility in Machine Learning Systems

An Empirical Analysis of Backward Compatibility in Machine Learning Systems

URL: http://arxiv.org/abs/2008.04572v1
Date: Tue, 11 Aug 2020 08:10:58 GMT
Title: An Empirical Analysis of Backward Compatibility in Machine Learning Systems
Authors: Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz
Abstract summary: We consider how updates, intended to improve ML models, can introduce new errors that can significantly affect downstream systems and users. For example, updates in models used in cloud-based classification services, such as image recognition, can cause unexpected erroneous behavior.
Score: 47.04803977692586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance. However, current practices for updating models rely solely on isolated, aggregate performance analyses, overlooking important dependencies, expectations, and needs in real-world deployments. We consider how updates, intended to improve ML models, can introduce new errors that can significantly affect downstream systems and users. For example, updates in models used in cloud-based classification services, such as image recognition, can cause unexpected erroneous behavior in systems that make calls to the services. Prior work has shown the importance of "backward compatibility" for maintaining human trust. We study challenges with backward compatibility across different ML architectures and datasets, focusing on common settings including data shifts with structured noise and ML employed in inferential pipelines. Our results show that (i) compatibility issues arise even without data shift due to optimization stochasticity, (ii) training on large-scale noisy datasets often results in significant decreases in backward compatibility even when model accuracy increases, and (iii) distributions of incompatible points align with noise bias, motivating the need for compatibility aware de-noising and robustness methods.

Related papers

Scale-Invariant Learning-to-Rank [0.0]
At Expedia, learning-to-rank models play a key role in sorting and presenting information more relevant to users. A major challenge in deploying these models is ensuring consistent feature scaling between training and production data. We introduce a scale-invariant LTR framework which combines a deep and a wide neural network to mathematically guarantee scale-invariance in the model at both training and prediction time. We evaluate our framework in simulated real-world scenarios with injected feature scale issues by perturbing the test set at prediction time, and show that even with inconsistent train-test scaling, using framework achieves better performance than
arXiv Detail & Related papers (2024-10-02T19:05:12Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
CMamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting [18.50360049235537]
Mamba, a state space model, has emerged with robust sequence and feature mixing capabilities. Capturing cross-channel dependencies is critical in enhancing performance of time series prediction. We introduce a refined Mamba variant tailored for time series forecasting.
arXiv Detail & Related papers (2024-06-08T01:32:44Z)
Root Causing Prediction Anomalies Using Explainable AI [3.970146574042422]
We present a novel application of explainable AI (XAI) for root-causing performance degradation in machine learning models. A single feature corruption can cause cascading feature, label and concept drifts. We have successfully applied this technique to improve the reliability of models used in personalized advertising.
arXiv Detail & Related papers (2024-03-04T19:38:50Z)
Towards Continually Learning Application Performance Models [1.2278517240988065]
Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions. Traditionally, these models assume that data distribution does not change as more samples are collected over time. We develop continually learning performance models that account for the distribution drift, alleviate catastrophic forgetting, and improve generalizability.
arXiv Detail & Related papers (2023-10-25T20:48:46Z)
Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise. We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z)
Quality In / Quality Out: Data quality more relevant than model choice in anomaly detection with the UGR'16 [0.29998889086656577]
We show that relatively minor modifications on a benchmark dataset cause significantly more impact on model performance than the specific ML technique considered. We also show that the measured model performance is uncertain, as a result of labelling inaccuracies.
arXiv Detail & Related papers (2023-05-31T12:03:12Z)
How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE) We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z)
Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC) SFSC generates a series of compatible sub-models with different capacities through one training process. SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z)
AI Total: Analyzing Security ML Models with Imperfect Data in Production [2.629585075202626]
Development of new machine learning models is typically done on manually curated data sets. We develop a web-based visualization system that allows the users to quickly gather headline performance numbers. It also enables the users to immediately observe the root cause of an issue when something goes wrong.
arXiv Detail & Related papers (2021-10-13T20:56:05Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time. We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.