From Memorization to Reasoning in the Spectrum of Loss Curvature
- URL: http://arxiv.org/abs/2510.24256v2
- Date: Fri, 31 Oct 2025 00:26:33 GMT
- Title: From Memorization to Reasoning in the Spectrum of Loss Curvature
- Authors: Jack Merullo, Srihita Vatsavaya, Lucius Bushnaq, Owen Lewis,
- Abstract summary: We show that memorization can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs)<n>We analyze the editing procedure extensively on its effect on downstream tasks in LMs, and find that fact retrieval and arithmetic are specifically and consistently negatively affected.
- Score: 6.463682206736737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature. This insight is based on prior theoretical and empirical work showing that the curvature for memorized training points is much sharper than non memorized, meaning ordering weight components from high to low curvature can reveal a distinction without explicit labels. This motivates a weight editing procedure that suppresses far more recitation of untargeted memorized data more effectively than a recent unlearning method (BalancedSubnet), while maintaining lower perplexity. Since the basis of curvature has a natural interpretation for shared structure in model weights, we analyze the editing procedure extensively on its effect on downstream tasks in LMs, and find that fact retrieval and arithmetic are specifically and consistently negatively affected, even though open book fact retrieval and general logical reasoning is conserved. We posit these tasks rely heavily on specialized directions in weight space rather than general purpose mechanisms, regardless of whether those individual datapoints are memorized. We support this by showing a correspondence between task data's activation strength with low curvature components that we edit out, and the drop in task performance after the edit. Our work enhances the understanding of memorization in neural networks with practical applications towards removing it, and provides evidence for idiosyncratic, narrowly-used structures involved in solving tasks like math and fact retrieval.
Related papers
- LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data [69.5099112089508]
Large language models (LLMs) exhibit remarkable generative capabilities but raise ethical and security concerns by memorizing sensitive data.<n>This work presents the first study of unlearning under perturbed or low-fidelity forget data, referred to as noisy forget sets.<n>We find that unlearning remains surprisingly robust to perturbations, provided that core semantic signals are preserved.
arXiv Detail & Related papers (2025-10-10T05:10:49Z) - Mechanistic Interpretability in the Presence of Architectural Obfuscation [0.0]
Architectural obfuscation is a lightweight substitute for heavyweight cryptography in privacy-preserving large-language-model (LLM) inference.<n>We analyze a GPT-2-small model trained from scratch with a representative obfuscation map.<n>Our findings reveal that obfuscation dramatically alters activation patterns within attention heads yet preserves the layer-wise computational graph.
arXiv Detail & Related papers (2025-06-22T14:39:16Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.<n>We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Unveiling Privacy, Memorization, and Input Curvature Links [11.290935303784208]
Memorization is closely related to several concepts such as generalization, noisy learning, and privacy.
Recent research has shown evidence linking input loss curvature (measured by the trace of the loss Hessian w.r.t inputs) and memorization.
We extend our analysis to establish theoretical links between differential privacy, memorization, and input loss curvature.
arXiv Detail & Related papers (2024-02-28T22:02:10Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - To grok or not to grok: Disentangling generalization and memorization on
corrupted algorithmic datasets [5.854190253899593]
We study an interpretable model where generalizing representations are understood analytically, and are easily distinguishable from the memorizing ones.
We show that (i) it is possible for the network to memorize the corrupted labels emphand achieve $100%$ generalization at the same time.
We also show that in the presence of regularization, the training dynamics involves two consecutive stages.
arXiv Detail & Related papers (2023-10-19T18:01:10Z) - Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z) - Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.
Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z) - Unveiling Transformers with LEGO: a synthetic reasoning task [23.535488809197787]
We study how the transformer architecture learns to follow a chain of reasoning.
In some data regime the trained transformer finds "shortcut" solutions to follow the chain of reasoning.
We find that one can prevent such shortcut with appropriate architecture modification or careful data preparation.
arXiv Detail & Related papers (2022-06-09T06:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.