Related papers: "Why" Has the Least Side Effect on Model Editing

"Why" Has the Least Side Effect on Model Editing

URL: http://arxiv.org/abs/2409.18679v1
Date: Fri, 27 Sep 2024 12:05:12 GMT
Title: "Why" Has the Least Side Effect on Model Editing
Authors: Tsung-Hsuan Pan, Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen,
Abstract summary: This paper delves into a critical factor-question type-by categorizing model editing questions. Our findings reveal that the extent of performance degradation varies significantly across different question types. We also examine the impact of batch size on side effects, discovering that increasing the batch size can mitigate performance drops.
Score: 25.67779910446609
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Training large language models (LLMs) from scratch is an expensive endeavor, particularly as world knowledge continually evolves. To maintain relevance and accuracy of LLMs, model editing has emerged as a pivotal research area. While these methods hold promise, they can also produce unintended side effects. Their underlying factors and causes remain largely unexplored. This paper delves into a critical factor-question type-by categorizing model editing questions. Our findings reveal that the extent of performance degradation varies significantly across different question types, providing new insights for experimental design in knowledge editing. Furthermore, we investigate whether insights from smaller models can be extrapolated to larger models. Our results indicate discrepancies in findings between models of different sizes, suggesting that insights from smaller models may not necessarily apply to larger models. Additionally, we examine the impact of batch size on side effects, discovering that increasing the batch size can mitigate performance drops.

Related papers

Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions [65.89403417819764]
We quantify the impact of design choices on language model capabilities. By incorporating features besides model size and number of training tokens, we can achieve a relative 3-28% increase in ability to predict downstream performance.
arXiv Detail & Related papers (2025-03-05T19:46:04Z)
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing [20.276952762837098]
Knowledge Editing (KE) algorithms alter models' weights to perform targeted updates to incorrect, outdated, or otherwise unwanted factual associations. Recent work has shown that applying KE can adversely affect models' factual recall accuracy and diminish their general reasoning abilities. We show that KE inadvertently affects representations of entities beyond the targeted one, distorting relevant structures that allow a model to infer unseen knowledge about an entity.
arXiv Detail & Related papers (2024-10-22T17:13:34Z)
What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model. Previous studies have primarily focused on merging a few small models. This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z)
Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective [10.547693900435917]
We propose a metric to study the impact of expansion by estimating the size of the manifold. Experimental results show a clear relationship between gains in performance and manifold size.
arXiv Detail & Related papers (2024-05-24T19:33:05Z)
Large Language Model Pruning [0.0]
We suggest a model pruning technique specifically focused on LLMs. The proposed methodology emphasizes the explainability of deep learning models. We also explore the difference between pruning on large-scale models vs. pruning on small-scale models.
arXiv Detail & Related papers (2024-05-24T18:22:15Z)
The Effect of Model Size on LLM Post-hoc Explainability via LIME [1.1073658091405039]
This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility. The key finding is that increased model size does not correlate with plausibility despite improved model performance.
arXiv Detail & Related papers (2024-05-08T18:27:20Z)
The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse [58.0132400208411]
Even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. benchmarking Large Language Models after each edit is impractically time-consuming and resource-intensive. We have utilized GPT-3.5 to develop a new dataset, HardEdit, based on hard cases.
arXiv Detail & Related papers (2024-02-15T01:50:38Z)
Traceability and Reuse Mechanisms, the most important Properties of Model Transformation Languages [1.4685355149711299]
We aim to quantitatively asses the interview results to confirm or reject the effects posed by different factors. Results show that the Tracing and Reuse Mechanisms are most important overall.
arXiv Detail & Related papers (2023-05-11T12:35:03Z)
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation. We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z)
On the Importance of Data Size in Probing Fine-tuned Models [18.69409646532038]
We show that the extent of encoded linguistic knowledge depends on the number of fine-tuning samples. We show through a set of experiments that fine-tuning data size affects the recoverability of the changes made to the model's linguistic knowledge.
arXiv Detail & Related papers (2022-03-17T21:45:17Z)
Predicting on the Edge: Identifying Where a Larger Model Does Better [61.793778186198864]
We show that large models have the largest improvement on examples where the small model is most uncertain. We show that a switcher model which defers examples to a larger model when a small model is uncertain can achieve striking improvements in performance and resource usage.
arXiv Detail & Related papers (2022-02-15T18:53:14Z)
Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models. Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
Knowledge distillation: A good teacher is patient and consistent [71.14922743774864]
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. We identify certain implicit design choices, which may drastically affect the effectiveness of distillation. We obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
arXiv Detail & Related papers (2021-06-09T17:20:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.