REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
- URL: http://arxiv.org/abs/2505.18933v1
- Date: Sun, 25 May 2025 01:57:06 GMT
- Title: REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
- Authors: Haitian Zhong, Yuhuan Liu, Ziyang Xu, Guofan Liu, Qiang Liu, Shu Wu, Zhe Zhao, Liang Wang, Tieniu Tan,
- Abstract summary: We introduce REACT, a framework for precise and controllable knowledge editing.<n>In the initial phase, we utilize tailored stimuli to extract latent factual representations.<n>In the second phase, we apply controllable perturbations to hidden states using the obtained vector with a magnitude scalar.
- Score: 42.89229070245538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language model editing methods frequently suffer from overfitting, wherein factual updates can propagate beyond their intended scope, overemphasizing the edited target even when it's contextually inappropriate. To address this challenge, we introduce REACT (Representation Extraction And Controllable Tuning), a unified two-phase framework designed for precise and controllable knowledge editing. In the initial phase, we utilize tailored stimuli to extract latent factual representations and apply Principal Component Analysis with a simple learnbale linear transformation to compute a directional "belief shift" vector for each instance. In the second phase, we apply controllable perturbations to hidden states using the obtained vector with a magnitude scalar, gated by a pre-trained classifier that permits edits only when contextually necessary. Relevant experiments on EVOKE benchmarks demonstrate that REACT significantly reduces overfitting across nearly all evaluation metrics, and experiments on COUNTERFACT and MQuAKE shows that our method preserves balanced basic editing performance (reliability, locality, and generality) under diverse editing scenarios.
Related papers
- MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs [82.34547399693966]
Existing methods for lifelong model editing compromise generalization, interfere with past edits, or fail to scale to long editing sequences.<n>We propose MEMOIR, a novel scalable framework that injects knowledge through a residual memory.<n>MeMOIR confines each edit to a distinct subset of the memory parameters, minimizing interference among edits.
arXiv Detail & Related papers (2025-06-09T16:16:42Z) - Tracing and Reversing Rank-One Model Edits [5.260519479124422]
This work investigates the traceability and reversibility of knowledge edits, focusing on the widely used Rank-One Model Editing (ROME) method.<n>We show that ROME introduces distinctive distributional patterns in the edited weight matrices, which can serve as effective signals for locating the edited weights.<n>We propose a method to infer the edited object entity directly from the modified weights, without access to the editing prompt, achieving over 95% accuracy.
arXiv Detail & Related papers (2025-05-27T07:27:01Z) - Learning Where to Edit Vision Transformers [27.038720045544867]
We propose a locate-then-edit approach for editing vision Transformers (ViTs) in computer vision.
We first address the where-to-edit challenge by meta-learning a hypernetwork on CutMix-augmented data.
To validate our method, we construct an editing benchmark that introduces subpopulation shifts towards natural underrepresented images and AI-generated images.
arXiv Detail & Related papers (2024-11-04T10:17:40Z) - Uncovering Overfitting in Large Language Model Editing [35.55260822503773]
We identify and investigate the phenomenon of Editing Overfit, where edited models assign disproportionately high probabilities to the edit target.
We propose a new plug-and-play strategy called Learn to Inference (LTI), which introduce a Multi-stage Inference Constraint module to guide the edited models in recalling new knowledge.
arXiv Detail & Related papers (2024-10-10T11:09:00Z) - PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse [58.0132400208411]
Even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks.
benchmarking Large Language Models after each edit is impractically time-consuming and resource-intensive.
We have utilized GPT-3.5 to develop a new dataset, HardEdit, based on hard cases.
arXiv Detail & Related papers (2024-02-15T01:50:38Z) - SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models.<n>We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching.<n>We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the CounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z) - Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue [122.20016030723043]
We evaluate the side effects of model editing on large language models (LLMs)
Our analysis reveals that the side effects are caused by model editing altering the original model weights excessively.
To mitigate this, a method named RECT is proposed to regularize the edit update weights.
arXiv Detail & Related papers (2024-01-09T18:03:15Z) - Test-Time Training for Semantic Segmentation with Output Contrastive
Loss [12.535720010867538]
Deep learning-based segmentation models have achieved impressive performance on public benchmarks, but generalizing well to unseen environments remains a major challenge.
This paper introduces Contrastive Loss (OCL), known for its capability to learn robust and generalized representations, to stabilize the adaptation process.
Our method excels even when applied to models initially pre-trained using domain adaptation methods on test domain data, showcasing its resilience and adaptability.
arXiv Detail & Related papers (2023-11-14T03:13:47Z) - On the Limits of Evaluating Embodied Agent Model Generalization Using
Validation Sets [101.28658250723804]
This paper experiments with augmenting a transformer model with modules that effectively utilize a wider field of view and learn to choose whether the next step requires a navigation or manipulation action.
We observe that the proposed modules resulted in improved, and in fact state-of-the-art performance on an unseen validation set of a popular benchmark dataset, ALFRED.
We highlight this result as we believe it may be a wider phenomenon in machine learning tasks but primarily noticeable only in benchmarks that limit evaluations on test splits.
arXiv Detail & Related papers (2022-05-18T23:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.