MUSCLE: A Model Update Strategy for Compatible LLM Evolution
- URL: http://arxiv.org/abs/2407.09435v1
- Date: Fri, 12 Jul 2024 17:12:48 GMT
- Title: MUSCLE: A Model Update Strategy for Compatible LLM Evolution
- Authors: Jessica Echterhoff, Fartash Faghri, Raviteja Vemulapalli, Ting-Yao Hu, Chun-Liang Li, Oncel Tuzel, Hadi Pouransari,
- Abstract summary: Large Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance.
Users often build a mental model of the functionality and capabilities of a particular machine learning model they are interacting with.
We propose a training strategy to minimize the number of inconsistencies in model updates.
- Score: 29.032461144831053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance. When updating models, developers often focus on increasing overall performance metrics with less emphasis on being compatible with previous model versions. However, users often build a mental model of the functionality and capabilities of a particular machine learning model they are interacting with. They have to adapt their mental model with every update -- a draining task that can lead to user dissatisfaction. In practice, fine-tuned downstream task adapters rely on pretrained LLM base models. When these base models are updated, these user-facing downstream task models experience instance regression or negative flips -- previously correct instances are now predicted incorrectly. This happens even when the downstream task training procedures remain identical. Our work aims to provide seamless model updates to a user in two ways. First, we provide evaluation metrics for a notion of compatibility to prior model versions, specifically for generative tasks but also applicable for discriminative tasks. We observe regression and inconsistencies between different model versions on a diverse set of tasks and model updates. Second, we propose a training strategy to minimize the number of inconsistencies in model updates, involving training of a compatibility model that can enhance task fine-tuned language models. We reduce negative flips -- instances where a prior model version was correct, but a new model incorrect -- by up to 40% from Llama 1 to Llama 2.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - MGit: A Model Versioning and Management System [7.2678752235785735]
MGit is a model versioning and management system that makes it easier to store, test, update, and collaborate on model derivatives.
MGit is able to reduce the lineage graph's storage footprint by up to 7x and automatically update downstream models in response to updates to upstream models.
arXiv Detail & Related papers (2023-07-14T17:56:48Z) - Backward Compatibility During Data Updates by Weight Interpolation [17.502410289568587]
We study the problem of regression during data updates and propose Backward Compatible Weight Interpolation (BCWI)
BCWI reduces negative flips without sacrificing the improved accuracy of the new model.
We also explore the use of importance weighting during and averaging the weights of multiple new models in order to further reduce negative flips.
arXiv Detail & Related papers (2023-01-25T12:23:10Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Revision Transformers: Instructing Language Models to Change their
Values [21.645935518842744]
Current transformer language models (LM) are large-scale models with billions of parameters.
We propose the Revision Transformer (RiT) to facilitate easy model updating.
The specific combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine makes it possible to update the model's knowledge with little effort and the help of user interaction.
arXiv Detail & Related papers (2022-10-19T07:05:06Z) - Learning Backward Compatible Embeddings [74.74171220055766]
We study the problem of embedding version updates and their backward compatibility.
We develop a solution based on learning backward compatible embeddings.
We show that the best method, which we call BC-Aligner, maintains backward compatibility with existing unintended tasks even after multiple model version updates.
arXiv Detail & Related papers (2022-06-07T06:30:34Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z) - Self-Updating Models with Error Remediation [0.5156484100374059]
We propose a framework, Self-Updating Models with Error Remediation (SUMER), in which a deployed model updates itself as new data becomes available.
A key component of SUMER is the notion of error remediation as self-labeled data can be susceptible to the propagation of errors.
We find that self-updating models (SUMs) generally perform better than models that do not attempt to self-update when presented with additional previously-unseen data.
arXiv Detail & Related papers (2020-05-19T23:09:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.