Related papers: Backward Compatibility During Data Updates by Weight Interpolation

Backward Compatibility During Data Updates by Weight Interpolation

URL: http://arxiv.org/abs/2301.10546v1
Date: Wed, 25 Jan 2023 12:23:10 GMT
Title: Backward Compatibility During Data Updates by Weight Interpolation
Authors: Raphael Schumann and Elman Mansimov and Yi-An Lai and Nikolaos Pappas and Xibin Gao and Yi Zhang
Abstract summary: We study the problem of regression during data updates and propose Backward Compatible Weight Interpolation (BCWI) BCWI reduces negative flips without sacrificing the improved accuracy of the new model. We also explore the use of importance weighting during and averaging the weights of multiple new models in order to further reduce negative flips.
Score: 17.502410289568587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Backward compatibility of model predictions is a desired property when updating a machine learning driven application. It allows to seamlessly improve the underlying model without introducing regression bugs. In classification tasks these bugs occur in the form of negative flips. This means an instance that was correctly classified by the old model is now classified incorrectly by the updated model. This has direct negative impact on the user experience of such systems e.g. a frequently used voice assistant query is suddenly misclassified. A common reason to update the model is when new training data becomes available and needs to be incorporated. Simply retraining the model with the updated data introduces the unwanted negative flips. We study the problem of regression during data updates and propose Backward Compatible Weight Interpolation (BCWI). This method interpolates between the weights of the old and new model and we show in extensive experiments that it reduces negative flips without sacrificing the improved accuracy of the new model. BCWI is straight forward to implement and does not increase inference cost. We also explore the use of importance weighting during interpolation and averaging the weights of multiple new models in order to further reduce negative flips.

Related papers

OLR-WA Online Regression with Weighted Average [0.0]
We introduce a new online linear regression approach to train machine learning models. The introduced model, named OLR-WA, uses user-defined weights to provide flexibility in the face of changing data. For consistent data, OLR-WA and the static batch model perform similarly and for varying data, the user can set the OLR-WA to adapt more quickly or to resist change.
arXiv Detail & Related papers (2023-07-06T06:39:27Z)
Analysis of Interpolating Regression Models and the Double Descent Phenomenon [3.883460584034765]
It is commonly assumed that models which interpolate noisy training data are poor to generalize. The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases. We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
arXiv Detail & Related papers (2023-04-17T09:44:33Z)
Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval [34.84329831602699]
cold-refresh model upgrades can only deploy new models after the gallery is overall backfilled, taking weeks or even months for massive data. In contrast, hot-refresh model upgrades deploy the new model immediately and then gradually improve the retrieval accuracy by backfilling the gallery on-the-fly.
arXiv Detail & Related papers (2022-01-24T14:59:12Z)
Forward Compatible Training for Representation Learning [53.300192863727226]
backward compatible training (BCT) modifies training of the new model to make its representations compatible with those of the old model. BCT can significantly hinder the performance of the new model. In this work, we propose a new learning paradigm for representation learning: forward compatible training (FCT)
arXiv Detail & Related papers (2021-12-06T06:18:54Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets. We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z)
Positive-Congruent Training: Towards Regression-Free Model Updates [87.25247195148187]
In image classification, sample-wise inconsistencies appear as "negative flips" A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model. We propose a simple approach for PC training, Focal Distillation, which enforces congruence with the reference model.
arXiv Detail & Related papers (2020-11-18T09:00:44Z)
Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z)
Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model. Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.