Related papers: Improving Prediction Backward-Compatiblility in NLP Model Upgrade with Gated Fusion

Improving Prediction Backward-Compatiblility in NLP Model Upgrade with Gated Fusion

URL: http://arxiv.org/abs/2302.02080v1
Date: Sat, 4 Feb 2023 03:40:35 GMT
Title: Improving Prediction Backward-Compatiblility in NLP Model Upgrade with Gated Fusion
Authors: Yi-An Lai, Elman Mansimov, Yuqing Xie, Yi Zhang
Abstract summary: When upgrading neural models to a newer version, new errors that were not encountered in the legacy version can be introduced, known as regression errors. We propose a novel method, Gated Fusion, that promotes backward compatibility via learning to mix predictions between old and new models.
Score: 8.173078054056337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When upgrading neural models to a newer version, new errors that were not encountered in the legacy version can be introduced, known as regression errors. This inconsistent behavior during model upgrade often outweighs the benefits of accuracy gain and hinders the adoption of new models. To mitigate regression errors from model upgrade, distillation and ensemble have proven to be viable solutions without significant compromise in performance. Despite the progress, these approaches attained an incremental reduction in regression which is still far from achieving backward-compatible model upgrade. In this work, we propose a novel method, Gated Fusion, that promotes backward compatibility via learning to mix predictions between old and new models. Empirical results on two distinct model upgrade scenarios show that our method reduces the number of regression errors by 62% on average, outperforming the strongest baseline by an average of 25%.

Related papers

Multi-Level Collaboration in Model Merging [56.31088116526825]
This paper explores the intrinsic connections between model merging and model ensembling. We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling.
arXiv Detail & Related papers (2025-03-03T07:45:04Z)
MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training. We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
Model Merging by Uncertainty-Based Gradient Matching [70.54580972266096]
We propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. Our new method gives consistent improvements for large language models and vision transformers.
arXiv Detail & Related papers (2023-10-19T15:02:45Z)
MixBCT: Towards Self-Adapting Backward-Compatible Training [66.52766344751635]
We propose MixBCT, a simple yet highly effective backward-compatible training method. We conduct experiments on the large-scale face recognition datasets MS1Mv3 and IJB-C.
arXiv Detail & Related papers (2023-08-14T05:55:38Z)
Measuring and Reducing Model Update Regression in Structured Prediction for NLP [31.86240946966003]
backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor. This work studies model update regression in structured prediction tasks. We propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured output.
arXiv Detail & Related papers (2022-02-07T07:04:54Z)
Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval [34.84329831602699]
cold-refresh model upgrades can only deploy new models after the gallery is overall backfilled, taking weeks or even months for massive data. In contrast, hot-refresh model upgrades deploy the new model immediately and then gradually improve the retrieval accuracy by backfilling the gallery on-the-fly.
arXiv Detail & Related papers (2022-01-24T14:59:12Z)
Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn. We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
Positive-Congruent Training: Towards Regression-Free Model Updates [87.25247195148187]
In image classification, sample-wise inconsistencies appear as "negative flips" A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model. We propose a simple approach for PC training, Focal Distillation, which enforces congruence with the reference model.
arXiv Detail & Related papers (2020-11-18T09:00:44Z)
Quantile Regularization: Towards Implicit Calibration of Regression Models [30.872605139672086]
We present a method for calibrating regression models based on a novel quantile regularizer defined as the cumulative KL divergence between two CDFs. We show that the proposed quantile regularizer significantly improves calibration for regression models trained using approaches, such as Dropout VI and Deep Ensembles.
arXiv Detail & Related papers (2020-02-28T16:53:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.