Fugu-MT 論文翻訳(概要): Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

論文の概要: Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

arxiv url: http://arxiv.org/abs/2504.07952v1
Date: Thu, 10 Apr 2025 17:57:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-04-18 15:50:58.748968
Title: Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
Title（参考訳）: Dynamic Cheatsheet: 適応メモリによるテスト時間学習
Authors: Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, James Zou,
Abstract要約: Dynamic Cheatsheet(DC)は、永続的で進化するメモリを備えたブラックボックス言語モデルを提供する軽量フレームワークである。 DCは、蓄積した戦略、コードスニペット、および推論時に一般的な問題解決の洞察をモデルが保存し再利用することを可能にする。このテストタイム学習は、明確な地味なラベルや人間のフィードバックを必要とせずに、幅広いタスクのパフォーマンスを大幅に向上させる。
参考スコア（独自算出の注目度）: 52.44029486173232
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their impressive performance on complex tasks, current language models (LMs) typically operate in a vacuum: Each input query is processed separately, without retaining insights from previous attempts. Here, we present Dynamic Cheatsheet (DC), a lightweight framework that endows a black-box LM with a persistent, evolving memory. Rather than repeatedly re-discovering or re-committing the same solutions and mistakes, DC enables models to store and reuse accumulated strategies, code snippets, and general problem-solving insights at inference time. This test-time learning enhances performance substantially across a range of tasks without needing explicit ground-truth labels or human feedback. Leveraging DC, Claude 3.5 Sonnet's accuracy more than doubled on AIME math exams once it began retaining algebraic insights across questions. Similarly, GPT-4o's success rate on Game of 24 increased from 10% to 99% after the model discovered and reused a Python-based solution. In tasks prone to arithmetic mistakes, such as balancing equations, DC enabled GPT-4o and Claude to reach near-perfect accuracy by recalling previously validated code, whereas their baselines stagnated around 50%. Beyond arithmetic challenges, DC yields notable accuracy gains on knowledge-demanding tasks. Claude achieved a 9% improvement in GPQA-Diamond and an 8% boost on MMLU-Pro problems. Crucially, DC's memory is self-curated, focusing on concise, transferable snippets rather than entire transcript. Unlike finetuning or static retrieval methods, DC adapts LMs' problem-solving skills on the fly, without modifying their underlying parameters. Overall, our findings present DC as a promising approach for augmenting LMs with persistent memory, bridging the divide between isolated inference events and the cumulative, experience-driven learning characteristic of human cognition.
Abstract（参考訳）: 複雑なタスクに対する印象的なパフォーマンスにもかかわらず、現在の言語モデル(LM)は一般的に真空で動作します。ここでは、永続的で進化するメモリを備えたブラックボックスLMを提供する軽量フレームワークであるDynamic Cheatsheet(DC)を紹介する。 DCは、同じソリューションとミスを繰り返し再発見または再コミットする代わりに、蓄積した戦略、コードスニペット、そして推論時間における一般的な問題解決の洞察をモデルで保存し再利用することができる。このテストタイム学習は、明確な地味なラベルや人間のフィードバックを必要とせずに、幅広いタスクのパフォーマンスを大幅に向上させる。 DCを活用して、Claude 3.5 Sonnet の精度は AIME の数学試験で2倍以上に向上した。同様に、GPT-4oのGame of 24の成功率は、モデルがPythonベースのソリューションを発見して再利用した後、10%から99%に増加した。方程式のバランスをとるような算術ミスをしがちなタスクでは、DC は GPT-4o と Claude を、検証済みのコードをリコールすることでほぼ完全な精度に到達させた。算術的課題以外にも、DCは知識要求タスクにおいて顕著な精度の向上をもたらす。クロードはGPQA-ダイアモンドを9%改善し、MMLU-Pro問題を8%改善した。重要なことに、DCのメモリは自己計算され、全書き起こしではなく、簡潔で転送可能なスニペットに焦点が当てられている。微調整や静的検索とは異なり、DCは基本パラメータを変更することなく、LMの問題解決スキルをオンザフライで適用する。総じて,本研究は,人間の認知の累積的,経験的学習特性と孤立推論事象の分割を橋渡しし,持続的記憶でLMを増強するための有望なアプローチとしてDCを提示した。

論文の概要: Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

関連論文リスト