Related papers: Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps

Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps

URL: http://arxiv.org/abs/2601.21624v1
Date: Thu, 29 Jan 2026 12:26:52 GMT
Title: Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps
Authors: Vasileios Sevetlidis, George Pavlidis,
Abstract summary: This is a protocol for portable, causal, uncertainty-aware measurement that attributes how much training history matters across models, data queues, and audit artifacts.
Score: 1.078600700827543
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern deep-learning training is not memoryless. Updates depend on optimizer moments and averaging, data-order policies (random reshuffling vs with-replacement, staged augmentations and replay), the nonconvex path, and auxiliary state (teacher EMA/SWA, contrastive queues, BatchNorm statistics). This survey organizes mechanisms by source, lifetime, and visibility. It introduces seed-paired, function-space causal estimands; portable perturbation primitives (carry/reset of momentum/Adam/EMA/BN, order-window swaps, queue/teacher tweaks); and a reporting checklist with audit artifacts (order hashes, buffer/BN checksums, RNG contracts). The conclusion is a protocol for portable, causal, uncertainty-aware measurement that attributes how much training history matters across models, data, and regimes.

Related papers

Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning [59.19460954480119]
We study whether forgotten knowledge originates from pretraining or supervised fine-tuning.<n>Our experiments show that pretrained and SFT models respond differently to unlearning.
arXiv Detail & Related papers (2026-02-23T08:58:48Z)
FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning [24.734154431191538]
FaLW is a plug-and-play, instance-wise dynamic loss reweighting method.<n>It assesses the unlearning state of each sample by comparing its predictive probability to the distribution of unseen data from the same class.<n>Experiments demonstrate that FaLW achieves superior performance.
arXiv Detail & Related papers (2026-01-26T16:21:01Z)
Memory in Large Language Models: Mechanisms, Evaluation and Evolution [8.158439933515131]
We propose a four-part taxonomy (parametric, contextual, external, procedural/episodic) and a memory quadruple (location, persistence, write/access path, controllability)<n>For updating and forgetting, we present DMM Gov: coordinating DAPT/TAPT, PEFT, model editing (ROME, MEND, MEMIT, SERAC), and RAG to form an auditable loop.<n>This yields a reproducible, comparable, and governable coordinate system for research and deployment.
arXiv Detail & Related papers (2025-09-23T10:06:58Z)
Unlearning at Scale: Implementing the Right to be Forgotten in Large Language Models [0.0]
Our approach treats as a minimal program and logs permicrobatch record.<n>Under pinned stack and deterministic kernels, replaying the training tail yields the same parameters as training retain set.
arXiv Detail & Related papers (2025-08-17T03:29:22Z)
Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [93.90047628101155]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.<n>To address this, some methods propose replaying data from previous tasks during new task learning.<n>However, it is not expected in practice due to memory constraints and data privacy issues.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
Robust Machine Learning by Transforming and Augmenting Imperfect Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning. We first discuss how to prevent ML from codifying prior human discrimination measured in the training data. We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z)
R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not. We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning) Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z)
Task-Aware Machine Unlearning and Its Application in Load Forecasting [4.00606516946677]
This paper introduces the concept of machine unlearning which is specifically designed to remove the influence of part of the dataset on an already trained forecaster. A performance-aware algorithm is proposed by evaluating the sensitivity of local model parameter change using influence function and sample re-weighting. We tested the unlearning algorithms on linear, CNN, andMixer based load forecasters with a realistic load dataset.
arXiv Detail & Related papers (2023-08-28T08:50:12Z)
Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening [51.34904967046097]
Selective Synaptic Dampening (SSD) is a fast, performant, and does not require long-term storage of the training data. We present a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data.
arXiv Detail & Related papers (2023-08-15T11:30:45Z)
Automatic Recall Machines: Internal Replay, Continual Learning and the Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity. We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective. Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.