Model Editing with Canonical Examples
- URL: http://arxiv.org/abs/2402.06155v1
- Date: Fri, 9 Feb 2024 03:08:12 GMT
- Title: Model Editing with Canonical Examples
- Authors: John Hewitt, Sarah Chen, Lanruo Lora Xie, Edward Adams, Percy Liang,
Christopher D. Manning
- Abstract summary: We introduce model editing with canonical examples.
A canonical example is a simple instance of good behavior, e.g., The capital of Mauritius is Port Louis.
We propose sense finetuning, which selects and finetunes a few sense vectors for each canonical example.
- Score: 75.33218320106585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce model editing with canonical examples, a setting in which (1) a
single learning example is provided per desired behavior, (2) evaluation is
performed exclusively out-of-distribution, and (3) deviation from an initial
model is strictly limited. A canonical example is a simple instance of good
behavior, e.g., The capital of Mauritius is Port Louis) or bad behavior, e.g.,
An aspect of researchers is coldhearted). The evaluation set contains more
complex examples of each behavior (like a paragraph in which the capital of
Mauritius is called for.) We create three datasets and modify three more for
model editing with canonical examples, covering knowledge-intensive
improvements, social bias mitigation, and syntactic edge cases. In our
experiments on Pythia language models, we find that LoRA outperforms full
finetuning and MEMIT. We then turn to the Backpack language model architecture
because it is intended to enable targeted improvement. The Backpack defines a
large bank of sense vectors--a decomposition of the different uses of each
word--which are weighted and summed to form the output logits of the model. We
propose sense finetuning, which selects and finetunes a few ($\approx$ 10)
sense vectors for each canonical example, and find that it outperforms other
finetuning methods, e.g., 4.8% improvement vs 0.3%. Finally, we improve
GPT-J-6B by an inference-time ensemble with just the changes from sense
finetuning of a 35x smaller Backpack, in one setting outperforming editing
GPT-J itself (4.1% vs 1.0%).
Related papers
- Take It Easy: Label-Adaptive Self-Rationalization for Fact Verification and Explanation Generation [15.94564349084642]
Self-rationalization method is typically used in natural language inference tasks.
We fine-tune a model to learn veracity prediction with annotated labels.
We generate synthetic explanations from three large language models.
arXiv Detail & Related papers (2024-10-05T02:19:49Z) - Decoding-Time Language Model Alignment with Multiple Objectives [116.42095026960598]
Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives.
Here, we propose $textbfmulti-objective decoding (MOD)$, a decoding-time algorithm that outputs the next token from a linear combination of predictions.
We show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.
arXiv Detail & Related papers (2024-06-27T02:46:30Z) - Large language models for aspect-based sentiment analysis [0.0]
We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings.
Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task.
arXiv Detail & Related papers (2023-10-27T10:03:21Z) - Fine-Tuning Language Models with Just Forward Passes [92.04219196752007]
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a large amount of memory.
We propose a memory-efficient zerothorder (MeZO) to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference.
arXiv Detail & Related papers (2023-05-27T02:28:10Z) - InheritSumm: A General, Versatile and Compact Summarizer by Distilling
from GPT [75.29359361404073]
InheritSumm is a versatile and compact summarization model derived from GPT-3.5 through distillation.
It achieves similar or superior performance to GPT-3.5 in zeroshot and fewshot settings.
arXiv Detail & Related papers (2023-05-22T14:52:32Z) - Explanation-based Finetuning Makes Models More Robust to Spurious Cues [21.327036110196637]
Large Language Models (LLMs) are so powerful that they sometimes learn correlations between labels and features that are irrelevant to the task.
We propose explanation-based finetuning as a general approach to mitigate LLMs' reliance on spurious correlations.
We finetune the model to additionally generate a free-text explanation supporting its answer.
arXiv Detail & Related papers (2023-05-08T18:53:45Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.