Improving Prediction Backward-Compatiblility in NLP Model Upgrade with
Gated Fusion
- URL: http://arxiv.org/abs/2302.02080v1
- Date: Sat, 4 Feb 2023 03:40:35 GMT
- Title: Improving Prediction Backward-Compatiblility in NLP Model Upgrade with
Gated Fusion
- Authors: Yi-An Lai, Elman Mansimov, Yuqing Xie, Yi Zhang
- Abstract summary: When upgrading neural models to a newer version, new errors that were not encountered in the legacy version can be introduced, known as regression errors.
We propose a novel method, Gated Fusion, that promotes backward compatibility via learning to mix predictions between old and new models.
- Score: 8.173078054056337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When upgrading neural models to a newer version, new errors that were not
encountered in the legacy version can be introduced, known as regression
errors. This inconsistent behavior during model upgrade often outweighs the
benefits of accuracy gain and hinders the adoption of new models. To mitigate
regression errors from model upgrade, distillation and ensemble have proven to
be viable solutions without significant compromise in performance. Despite the
progress, these approaches attained an incremental reduction in regression
which is still far from achieving backward-compatible model upgrade. In this
work, we propose a novel method, Gated Fusion, that promotes backward
compatibility via learning to mix predictions between old and new models.
Empirical results on two distinct model upgrade scenarios show that our method
reduces the number of regression errors by 62% on average, outperforming the
strongest baseline by an average of 25%.
Related papers
- MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training.
We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z) - Model Merging by Uncertainty-Based Gradient Matching [70.54580972266096]
We propose a new uncertainty-based scheme to improve the performance by reducing the mismatch.
Our new method gives consistent improvements for large language models and vision transformers.
arXiv Detail & Related papers (2023-10-19T15:02:45Z) - MixBCT: Towards Self-Adapting Backward-Compatible Training [66.52766344751635]
We propose MixBCT, a simple yet highly effective backward-compatible training method.
We conduct experiments on the large-scale face recognition datasets MS1Mv3 and IJB-C.
arXiv Detail & Related papers (2023-08-14T05:55:38Z) - Measuring and Reducing Model Update Regression in Structured Prediction
for NLP [31.86240946966003]
backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor.
This work studies model update regression in structured prediction tasks.
We propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured output.
arXiv Detail & Related papers (2022-02-07T07:04:54Z) - Hot-Refresh Model Upgrades with Regression-Alleviating Compatible
Training in Image Retrieval [34.84329831602699]
cold-refresh model upgrades can only deploy new models after the gallery is overall backfilled, taking weeks or even months for massive data.
In contrast, hot-refresh model upgrades deploy the new model immediately and then gradually improve the retrieval accuracy by backfilling the gallery on-the-fly.
arXiv Detail & Related papers (2022-01-24T14:59:12Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - Positive-Congruent Training: Towards Regression-Free Model Updates [87.25247195148187]
In image classification, sample-wise inconsistencies appear as "negative flips"
A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model.
We propose a simple approach for PC training, Focal Distillation, which enforces congruence with the reference model.
arXiv Detail & Related papers (2020-11-18T09:00:44Z) - Quantile Regularization: Towards Implicit Calibration of Regression
Models [30.872605139672086]
We present a method for calibrating regression models based on a novel quantile regularizer defined as the cumulative KL divergence between two CDFs.
We show that the proposed quantile regularizer significantly improves calibration for regression models trained using approaches, such as Dropout VI and Deep Ensembles.
arXiv Detail & Related papers (2020-02-28T16:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.