Related papers: Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

URL: http://arxiv.org/abs/2504.07952v1
Date: Thu, 10 Apr 2025 17:57:33 GMT
Title: Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
Authors: Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, James Zou,
Abstract summary: Dynamic Cheatsheet (DC) is a lightweight framework that endows a black-box language model with a persistent, evolving memory.<n>DC enables models to store and reuse accumulated strategies, code snippets, and general problem-solving insights at inference time.<n>This test-time learning enhances performance substantially across a range of tasks without needing explicit ground-truth labels or human feedback.
Score: 52.44029486173232
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their impressive performance on complex tasks, current language models (LMs) typically operate in a vacuum: Each input query is processed separately, without retaining insights from previous attempts. Here, we present Dynamic Cheatsheet (DC), a lightweight framework that endows a black-box LM with a persistent, evolving memory. Rather than repeatedly re-discovering or re-committing the same solutions and mistakes, DC enables models to store and reuse accumulated strategies, code snippets, and general problem-solving insights at inference time. This test-time learning enhances performance substantially across a range of tasks without needing explicit ground-truth labels or human feedback. Leveraging DC, Claude 3.5 Sonnet's accuracy more than doubled on AIME math exams once it began retaining algebraic insights across questions. Similarly, GPT-4o's success rate on Game of 24 increased from 10% to 99% after the model discovered and reused a Python-based solution. In tasks prone to arithmetic mistakes, such as balancing equations, DC enabled GPT-4o and Claude to reach near-perfect accuracy by recalling previously validated code, whereas their baselines stagnated around 50%. Beyond arithmetic challenges, DC yields notable accuracy gains on knowledge-demanding tasks. Claude achieved a 9% improvement in GPQA-Diamond and an 8% boost on MMLU-Pro problems. Crucially, DC's memory is self-curated, focusing on concise, transferable snippets rather than entire transcript. Unlike finetuning or static retrieval methods, DC adapts LMs' problem-solving skills on the fly, without modifying their underlying parameters. Overall, our findings present DC as a promising approach for augmenting LMs with persistent memory, bridging the divide between isolated inference events and the cumulative, experience-driven learning characteristic of human cognition.

Related papers

Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models [70.78205685001168]
We investigate knowledge forgetting in large language models with a focus on its generalisation.<n> UGBench is the first benchmark specifically designed to assess the unlearning of in-scope implicit knowledge.<n>We propose PerMU, a novel probability-based unlearning paradigm.
arXiv Detail & Related papers (2025-02-27T11:03:33Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning [22.13331870720021]
We propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA) C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge. Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2024-07-14T17:40:40Z)
Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z)
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results. We develop a method that avoids introducing external resources, relying instead on perturbations to the input. Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [93.90047628101155]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To address this, some methods propose replaying data from previous tasks during new task learning. However, it is not expected in practice due to memory constraints and data privacy issues.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning [30.676509834338884]
Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data. We propose prompting approaches as an alternative to data-rehearsal. We show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4.5% in average final accuracy.
arXiv Detail & Related papers (2022-11-23T18:57:11Z)
An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks [40.81306982129298]
Parametric and retrieval-augmented models have complementary strengths in terms of computational efficiency and predictive accuracy. We propose the Efficient Memory-Augmented Transformer (EMAT) It encodes external knowledge into a key-value memory and exploits the fast maximum inner product search for memory querying.
arXiv Detail & Related papers (2022-10-30T08:34:49Z)
Rectification-based Knowledge Retention for Continual Learning [49.1447478254131]
Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting. We propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner. Our approach can be used in both the zero-shot and non zero-shot task incremental learning settings.
arXiv Detail & Related papers (2021-03-30T18:11:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.